Doug, what would be the best way to handle cross-language indexing and searching? The issue is that when indexing web sites or intranet documents one might come acorss documents in different languages. Assuming that the language can be detected, and an analyzer for that language is available, one could then create documents that have have fields of the form: search_<LANG>, where <LANG> is a Java Locale code (or something similar). When a query is constructed, it can be expanded to take a query in the language of the user and "translate" it term-by-term using a dictionary lookup. Then create an OR-ed query where a query component in a given language is done against a correspondingly named field. What do you think of this approach? Is there a better way? It seems that this would bypass the single analyzer limitation you mentioned since the analysis is done by custom code before the query is submitted (and by other custom code during indexing). Am I right on this one? Dmitry. ================================================== Given lucene only supports one analyzer per index, the latter seems like what's needed. Another approach is to change lucene's index to track which fields were tokenized and which weren't. This would be fairly easy to add. Then you could simply pass in the IndexReader to the query parser and not analyze untokenized fields. If that sounds like a sufficient solution, then I would be willing to add tracking of which fields are tokenized to the indexing code. Doug _______________________________________________ Lucene-dev mailing list [EMAIL PROTECTED] http://lists.sourceforge.net/lists/listinfo/lucene-dev
