I'll add an entry on the wiki describing ways to handle hierarchal information if the info moves to the FAQ, we can remove it then.
cheers, sv On Fri, 2 Apr 2004, Tatu Saloranta wrote: > On Friday 02 April 2004 02:30, [EMAIL PROTECTED] wrote: > > Dear all, > > > > I want to do expand a search in order to retrieve > > matching XML documents with the help of a domain > > taxonomy. That means if someone is specifying a term > > high up in the taxonomy it will have lots of > > subconcepts. > > > > Everything is working fine so far except that Lucene > > creates an 'input string too long' error when I ask for > > e.g. subject:(term001 ... term800). > > If I'm not mistaken, perhaps you shouldn't expand terms at all, but do path > query with components instead. This effectively means that instead of your > app flattening the structure and ending up with hundreds of leafs to match, > you use Lucene in sort of "hierarchy-aware" way. > There have been a few questions (and answers) regarding implementation of such > a feature. > Almost seems like there should be a FAQ entry (or Eric could add an example to > his book? :-) ). > > There are at least 2 way to do this; one is to combine one long 'word' (unit > analyzer does not split into separate tokens, ie. words), something like: > > doohickeys-gadgets-foobar > > and search using prefix query ("doohickeys-gadgets-*"), or: > > STARTMARKER doohickeys gadgets foobar ENDMARKER > > and use phrase query ("STARTMARKER doohickeys gadgets"). (STARTMARKER and > ENDMARKER only if components are not guaranteed to be unique, and one needs > to make sure query is restricted to individual classification entry) > > Both approaches can be varied by using some internal ids instead of actual > Strings (UUIDs, sequence numbers). > > Does above make sense? > > -+ Tatu +- > > > > > I am aware that it is not a usual task for a search > > engine to take several hundered terms as input. > > Is there a distinct limit (I haven't got the source > > yet) and - more important - is there a configuration to > > work around ? > > > > If not I would break down the input terms in blocks and > > send them to Lucene sequentially... > > However I would prefer I I could adjust the limit. > > > > Is search expansion something you as developers are > > looking into ? > > > > Thanks > > > > Holger > > > > > > ___________________________________________________ > > The ALL NEW CS2000 from CompuServe > > Better! Faster! More Powerful! > > 250 FREE hours! Sign-on Now! > > http://www.compuserve.com/trycsrv/cs2000/webmail/ > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]