Re: Problem indexing Spanish Characters

Otis Gospodnetic Wed, 19 May 2004 08:43:41 -0700

It looks like Snowball project supports Spanish:
http://www.google.com/search?q=snowball spanish


If it does, take a look at Lucene Sandbox.  There is a project that
allows you to use Snowball analyzers with Lucene.

Otis


--- Hannah c <[EMAIL PROTECTED]> wrote:
> 
> Hi,
> 
> I  am indexing a number of English articles on Spanish resorts. As
> such 
> there are a number of spanish characters throught the text, most of
> these 
> are in the place names which are the type of words I would like to
> use as 
> queries. My problem is with the StandardTokenizer class which cuts
> the word 
> into two when it comes across any of the spanish characters. I had a
> look at 
> the source but the code was generated by JavaCC and so is not very
> readable. 
> I was wondering if there was a way around this problem or which area
> of the 
> code I would need to change to avoid this.
> 
> Thanks
> Hannah Cumming
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
g snowball s

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Problem indexing Spanish Characters

Reply via email to