Hi All, That you all for you comments earlier on using Lucene as a web service. I am looking at Solr, and it does have some potential for my application.
In the meantime I have this question: What are the recommended best practices for using Lucene to index multiple languages? Particularly, would I be better off using a separate index for each language? Here is our scenario: We have a database that has several text fields that will be translated to multiple languages. The text will be only be incrementally translated, and it could take years to get it completely translated. Also, new untranslated data is always being added. Also, we may add new languages to be translated at any time. When a user selects to view our web application in a foreign language, we want the user to be able to search in either their language, or in English (in order to guarantee that they can find all data). I probably won't know which language they actually entered for the text search. I want search Lucene in both the English and the selected language, and return any results that are found. FYI, I will be using Lucene to return a list of IDs that are unique to our data, and then joining back to our data, using SQL. I will use our database to show a mix of translated and untranslated data. That is, translated data/fields are show if we have it, otherwise the default English is shown. So I don't need to get the text itself from Lucene, just a list of ID's that I can use in my SQL query. I can pull out our data easily in either language, or a mix, in order to create Lucene indexes. If I can mix languages in a single index, I would like to add a Language column to query on, and query on both the english and the foreign language text. If not, I can see it working to query to run two seperate Lucene queries on two seperate indexes, and combining the resulting ID list into a single list (and making it unique, if needed). If you have any comments, or feedback from experience doing anything like this, it would be much appreciated! Douglas Smith
