Fredrik,

I would tackle your problem like this:

Say that that field you want to index is "path". I would turn this into
*three* indexed fields:
1) multiple path prefixes ("pre-paths")
2) multiple path suffixes ("post-paths")
3) the number of "components" in the path ("path-size").

For example, for a "path" of "/foo/bar/dog/cat/fish" I would index it
like this:

doc.add(Field.Keyword("pre-paths", "/foo/bar/dog/cat/fish/"));
doc.add(Field.Keyword("pre-paths", "/foo/bar/dog/cat/"));
doc.add(Field.Keyword("pre-paths", "/foo/bar/dog/"));
doc.add(Field.Keyword("pre-paths", "/foo/bar/"));
doc.add(Field.Keyword("pre-paths", "/foo/"));
doc.add(Field.Keyword("pre-paths", "/"));
doc.add(Field.Keyword("post-paths", "/foo/bar/dog/cat/fish/"));
doc.add(Field.Keyword("post-paths", "/bar/dog/cat/fish/"));
doc.add(Field.Keyword("post-paths", "/dog/cat/fish/"));
doc.add(Field.Keyword("post-paths", "/cat/fish/"));
doc.add(Field.Keyword("post-paths", "/fish/"));
doc.add(Field.Keyword("post-paths", "/"));
doc.add(Field.Keyword("path-size", "5"));

And to do your "type 2" search for (prefix="/p1/p2/p3/" and
suffix="/s1/s2/s3/") I would use a query like this:

Query q = QueryParser.parse("pre-paths:'/p1/p2/p3/' AND
post-paths:'/s1/s2/s3/ AND (path-size:7)'");

The trick is to lock down the prefix and suffix, then define the amount
of "slack" between the prefix and the suffix using the path-size. If you
wanted the "slack" between either end to be zero or one segments, then
change the size clause to something like (path-size:6 OR path-size:7)


I think that should work.

=Matt


Fredrik Lindner wrote:

Hi all!

I'm currently developing an application in which text searching is a
main component. Among other things, a document will contain a field
denoting hierarchical information. The information is stored as a string
using the common path syntax, /x/y/z/etc/...

I would like to be able to search documents based on the path field
using two different selection criteria's,

1. given a prefix path and a suffix path select all documents for which
the path start with the supplied prefix, ends with the suffix and has
"some path" in between.

2. like (1) but with the requirement that "some path" spans one and one
level only. i.e. it defines a strict grandparent/grandchild relationship
between the last path entry of the prefix and the first of the suffix.

For example, with prefix /p1/p2/p3/ and suffix /s1/s2/s3/ and three
documents with the path filed values

a) /p1/p2/p3/x/s1/s2/s3/
b) /p1/p2/p3/y/s1/s2/s3/
c) /p1/p2/p3/x/y/s1/s2/s3/

case one should select them all whereas case two should select only a)
and b).

My problem is that I am uncertain on how to implement the second case. I
guess I have to extend the Lucene internals somehow but I am quite too
inexperienced regarding Lucene to do so directly. Any pointers, hints or
comments are most welcome.

Regards
/Fredrik


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]







---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to