Hello. We've designing a CMS in Java, and I've trying to implement site search function using lucene.
The basic conception is that: - Site features numerous objects that we'd like to throw into index: pages, various text blocks on those pages, descriptions and keyword lists of those pages, static bits of html, goods sections with goods inside them, etc, etc. - There would be a search form that would be occasionally used by site visitors. - Visitors are highly unlikely to use advanced queries. I assume 95% queries would be either a few keywords or a phrase to search. We have to find the best matches for such queries. - The thing I want to introduce is "phrase in quotes" to search for exact phrase. Also, most our sites are in Russian, so some, even if rudimentary, support for Russian morphology is a plus. I've dug into examples and have a following set of questions: - Our objects are fairly structured, so I would like to introduce a lot of fields, something like five different for each object type. But, as far as I see, all Queries are going to search only one field. This is certainly bad because users surely want to search *all* the fields at once. The aren't going to bother with queries. Maybe I can add queries over every field joined by 'or' operation, but wouldn't that be too slow? I don't want it to work more than half second on reasonable sized index. Also, I don't want to hard-code exact list of fields, I might add them as I develop the system. Is this doable, would that work? Or I'll have to stuff all text content from object into one blob-field and query that? Which way is more reasonable? - Our objects have their hierarchy, e.g., blocks belong to page. Is there a way to make Lucene govern parent-child relation, somehow summing hits in all childs to find the best-matching parent? I assume, no, then is there a way for me to go thru matching documents list, reducing it by 'adding' blocks' scores to find the best matching page? - Is there a way to set weights for different fields? Let's say, content have a weight of 1, title have a weight of 5 and picture subscribe have a weight of 0.5. If no, can I do that by hand? - Is there something to support Russian morphology (it's all like "the last n letters of a word might change, we should match all forms") for either indexer or searcher? Maybe "inexact match", QueryParser's ~ operator, would be enough? I heard Nutch project have something like that, but I wonder if I would be able to reuse parts of Nutch, and I surely can't use Nutch as a whole. If there are another considerations, they're welcome. Thanks for your probable replies. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]