Fredrik,
I would tackle your problem like this:
Say that that field you want to index is "path". I would turn this into
*three* indexed fields:
1) multiple path prefixes ("pre-paths")
2) multiple path suffixes ("post-paths")
3) the number of "components" in the path ("path-size").For example, for a "path" of "/foo/bar/dog/cat/fish" I would index it like this:
doc.add(Field.Keyword("pre-paths", "/foo/bar/dog/cat/fish/"));
doc.add(Field.Keyword("pre-paths", "/foo/bar/dog/cat/"));
doc.add(Field.Keyword("pre-paths", "/foo/bar/dog/"));
doc.add(Field.Keyword("pre-paths", "/foo/bar/"));
doc.add(Field.Keyword("pre-paths", "/foo/"));
doc.add(Field.Keyword("pre-paths", "/"));
doc.add(Field.Keyword("post-paths", "/foo/bar/dog/cat/fish/"));
doc.add(Field.Keyword("post-paths", "/bar/dog/cat/fish/"));
doc.add(Field.Keyword("post-paths", "/dog/cat/fish/"));
doc.add(Field.Keyword("post-paths", "/cat/fish/"));
doc.add(Field.Keyword("post-paths", "/fish/"));
doc.add(Field.Keyword("post-paths", "/"));
doc.add(Field.Keyword("path-size", "5"));And to do your "type 2" search for (prefix="/p1/p2/p3/" and suffix="/s1/s2/s3/") I would use a query like this:
Query q = QueryParser.parse("pre-paths:'/p1/p2/p3/' AND
post-paths:'/s1/s2/s3/ AND (path-size:7)'");The trick is to lock down the prefix and suffix, then define the amount of "slack" between the prefix and the suffix using the path-size. If you wanted the "slack" between either end to be zero or one segments, then change the size clause to something like (path-size:6 OR path-size:7)
I think that should work.
=Matt
Fredrik Lindner wrote:
Hi all!
I'm currently developing an application in which text searching is a main component. Among other things, a document will contain a field denoting hierarchical information. The information is stored as a string using the common path syntax, /x/y/z/etc/...
I would like to be able to search documents based on the path field using two different selection criteria's,
1. given a prefix path and a suffix path select all documents for which the path start with the supplied prefix, ends with the suffix and has "some path" in between.
2. like (1) but with the requirement that "some path" spans one and one level only. i.e. it defines a strict grandparent/grandchild relationship between the last path entry of the prefix and the first of the suffix.
For example, with prefix /p1/p2/p3/ and suffix /s1/s2/s3/ and three documents with the path filed values
a) /p1/p2/p3/x/s1/s2/s3/ b) /p1/p2/p3/y/s1/s2/s3/ c) /p1/p2/p3/x/y/s1/s2/s3/
case one should select them all whereas case two should select only a) and b).
My problem is that I am uncertain on how to implement the second case. I guess I have to extend the Lucene internals somehow but I am quite too inexperienced regarding Lucene to do so directly. Any pointers, hints or comments are most welcome.
Regards /Fredrik
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
