Hi there,
 
I am a newbie to Lucene and I'm considering using it in an upcoming project.
I've read through the documentation but I still have a number of questions:
 
1. SEGMENTING AN INDEX & QUERIES BY SITE SCOPE
In my use case, I have a number of logical websites backed by the same
underlying content store.  A Document may be ultimately end up "belonging"
to one or more logical sites, but at a distinct URL for each.  The
simplistic solution is to maintain indices for each logical site, but this
will result in some unwanted duplication and the need to update multiple
indices on "shared" content changes.  Other than that, can anyone suggest
approaches for how to segment a single index to accomodate multiple logical
sites and allow queries within a particlar site's scope?  Are fields the
solution?  How should the distinct per-site URLs be managed?
 
2. LOCALIZED CONTENT
I understand that at its core, Lucene can support content from any locale
and character set supported by Java.  What is the best way of implementing
Lucene to handle a content base which includes numerous locales.  One index
per locale or should all Documents be placed in a single index and tagged
with a "locale" field?  Or is there another approach altogether?
 
3. DOCUMENT URLS
Is the URL at which the original document can be retrieved generally (i.e.,
for linking search results to the original doc) stored as a non-index,
non-tokenized, stored Field in the Document?
 
4. QUERY FILTERING & SORTING BY FIELD VALUE
In my application I have a pretty typical need to distinguish between
different document types (e.g., FAQs, Articles, Reviews, etc.) in order to
allow the user to restrict their results to particular types of documents or
to sort results by type.  Are fields again the solution for this?  Can
Queries filter or sort results/hits on exact field values (i.e.,
non-tokenized field values).
 
5. DEPLOYING LUCENE IN A CLUSTERED WEB-APP ENVIRONMENT
How is Lucene to be deployed in a clustered web-app environment?  Do all
cluster nodes require access to a networked filesystem containing the index
files or is there another solution?  How is concurrency managed when the
index is being incrementally updated?
 
Any answers and suggestions are much appreciated.  Thanks.
 
--Daniel

Reply via email to