Hi there, I am a newbie to Lucene and I'm considering using it in an upcoming project. I've read through the documentation but I still have a number of questions: 1. SEGMENTING AN INDEX & QUERIES BY SITE SCOPE In my use case, I have a number of logical websites backed by the same underlying content store. A Document may be ultimately end up "belonging" to one or more logical sites, but at a distinct URL for each. The simplistic solution is to maintain indices for each logical site, but this will result in some unwanted duplication and the need to update multiple indices on "shared" content changes. Other than that, can anyone suggest approaches for how to segment a single index to accomodate multiple logical sites and allow queries within a particlar site's scope? Are fields the solution? How should the distinct per-site URLs be managed? 2. LOCALIZED CONTENT I understand that at its core, Lucene can support content from any locale and character set supported by Java. What is the best way of implementing Lucene to handle a content base which includes numerous locales. One index per locale or should all Documents be placed in a single index and tagged with a "locale" field? Or is there another approach altogether? 3. DOCUMENT URLS Is the URL at which the original document can be retrieved generally (i.e., for linking search results to the original doc) stored as a non-index, non-tokenized, stored Field in the Document? 4. QUERY FILTERING & SORTING BY FIELD VALUE In my application I have a pretty typical need to distinguish between different document types (e.g., FAQs, Articles, Reviews, etc.) in order to allow the user to restrict their results to particular types of documents or to sort results by type. Are fields again the solution for this? Can Queries filter or sort results/hits on exact field values (i.e., non-tokenized field values). 5. DEPLOYING LUCENE IN A CLUSTERED WEB-APP ENVIRONMENT How is Lucene to be deployed in a clustered web-app environment? Do all cluster nodes require access to a networked filesystem containing the index files or is there another solution? How is concurrency managed when the index is being incrementally updated? Any answers and suggestions are much appreciated. Thanks. --Daniel
