Hi Matt, thanks for your reply! Indeed your proposed solution would work for the simple case I described, however I failed to mention that I must be able to combine the described queries to a complex one and there for can't make any assumptions based on size attribute. I'm sorry If I wasted your time.
On the bright side though I found that creating a specialized query wasn't that difficult at all. A quick scan through the WildCardQuery class and the related WildCardQueryEnum gave me some valuable hints regarding this. If there is any interest I would happily contribute the code back to the community. Regards /Fredrik -----Original Message----- From: Matt Quail [mailto:[EMAIL PROTECTED] Sent: den 17 maj 2004 12:19 To: Lucene Users List Subject: Re: hierarchical search Fredrik, I would tackle your problem like this: Say that that field you want to index is "path". I would turn this into *three* indexed fields: 1) multiple path prefixes ("pre-paths") 2) multiple path suffixes ("post-paths") 3) the number of "components" in the path ("path-size"). For example, for a "path" of "/foo/bar/dog/cat/fish" I would index it like this: doc.add(Field.Keyword("pre-paths", "/foo/bar/dog/cat/fish/")); doc.add(Field.Keyword("pre-paths", "/foo/bar/dog/cat/")); doc.add(Field.Keyword("pre-paths", "/foo/bar/dog/")); doc.add(Field.Keyword("pre-paths", "/foo/bar/")); doc.add(Field.Keyword("pre-paths", "/foo/")); doc.add(Field.Keyword("pre-paths", "/")); doc.add(Field.Keyword("post-paths", "/foo/bar/dog/cat/fish/")); doc.add(Field.Keyword("post-paths", "/bar/dog/cat/fish/")); doc.add(Field.Keyword("post-paths", "/dog/cat/fish/")); doc.add(Field.Keyword("post-paths", "/cat/fish/")); doc.add(Field.Keyword("post-paths", "/fish/")); doc.add(Field.Keyword("post-paths", "/")); doc.add(Field.Keyword("path-size", "5")); And to do your "type 2" search for (prefix="/p1/p2/p3/" and suffix="/s1/s2/s3/") I would use a query like this: Query q = QueryParser.parse("pre-paths:'/p1/p2/p3/' AND post-paths:'/s1/s2/s3/ AND (path-size:7)'"); The trick is to lock down the prefix and suffix, then define the amount of "slack" between the prefix and the suffix using the path-size. If you wanted the "slack" between either end to be zero or one segments, then change the size clause to something like (path-size:6 OR path-size:7) I think that should work. =Matt Fredrik Lindner wrote: > Hi all! > > I'm currently developing an application in which text searching is a > main component. Among other things, a document will contain a field > denoting hierarchical information. The information is stored as a string > using the common path syntax, /x/y/z/etc/... > > I would like to be able to search documents based on the path field > using two different selection criteria's, > > 1. given a prefix path and a suffix path select all documents for which > the path start with the supplied prefix, ends with the suffix and has > "some path" in between. > > 2. like (1) but with the requirement that "some path" spans one and one > level only. i.e. it defines a strict grandparent/grandchild relationship > between the last path entry of the prefix and the first of the suffix. > > For example, with prefix /p1/p2/p3/ and suffix /s1/s2/s3/ and three > documents with the path filed values > > a) /p1/p2/p3/x/s1/s2/s3/ > b) /p1/p2/p3/y/s1/s2/s3/ > c) /p1/p2/p3/x/y/s1/s2/s3/ > > case one should select them all whereas case two should select only a) > and b). > > My problem is that I am uncertain on how to implement the second case. I > guess I have to extend the Lucene internals somehow but I am quite too > inexperienced regarding Lucene to do so directly. Any pointers, hints or > comments are most welcome. > > Regards > /Fredrik > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
