http://wiki.apache.org/jakarta-lucene/IndexingOtherLanguages

>>> [EMAIL PROTECTED] 6/3/2005 6:03:31 AM >>>

On Jun 2, 2005, at 9:06 PM, Bob Cheung wrote:
> Btw, I did try running the lucene demo (web template) to index the  
> HTML
> files after I added one including English and Chinese characters.   
> I was
> not able to search for any Chinese in that HTML file (returned no  
> hits).
> I wonder whether I need to change some of the java programs to index
> Chinese and/or accept Chinese as search term.  I was able to search 

> for
> the HTML file if I used English word that appeared in the added HTML
> file.

Bob - Andy provided thorough information on the StandardAnalyzer  
issue (in short, it deals with Unicode directly not encodings).  As  
for the Lucene demo - you will have to adjust it to read the files in 

the proper encoding.  The IndexFiles program indexes files using the  
default encoding which won't be sufficient for your purpose.  The two 

files to check are HtmlDocument and FileDocument.  These files read  
the HTML and text files that the demo indexes.

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED] 
For additional commands, e-mail: [EMAIL PROTECTED] 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to