Problem indexing Spanish Characters

Hannah c Wed, 19 May 2004 08:33:38 -0700

Hi,

I am indexing a number of English articles on Spanish resorts. As such there are a number of spanish characters throught the text, most of these are in the place names which are the type of words I would like to use as queries. My problem is with the StandardTokenizer class which cuts the word into two when it comes across any of the spanish characters. I had a look at the source but the code was generated by JavaCC and so is not very readable. I was wondering if there was a way around this problem or which area of the code I would need to change to avoid this.

Thanks
Hannah Cumming

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Problem indexing Spanish Characters

Reply via email to