Terms Matching Query

Paul Davis Tue, 14 Oct 2008 14:27:16 -0700

I'm working on indexing JSON documents via Lucene and I've run into a
bit of a snag. Currently, I'm indexing JSON documents by adding fields
that are path/value pairs. For example, given a JSON document like:


{
  "name":
    {
      "first": "Paul",
      "last": "Davis"
    }
  "jobs": ["hotdog vendor", "signboard guy"]
}

I build the doc with something like:

doc.add(new Field("/name/first", "Paul", ...))
doc.add(new Field("/name/last", "Davis", ...))
doc.add(new Field("/jobs/$0", "hotdog vendor", ...))
doc.add(new Field("/jobs/$1", "signboard guy", ...))

doc.add(new Field("__JSON_PATH__", "/name/first", ...))
doc.add(new Field("__JSON_PATH__", "/name/last", ...))
doc.add(new Field("__JSON_PATH__", "/jobs/$0", ...))
doc.add(new Field("__JSON_PATH__", "/jobs/$1", ...))

Now, at query time I'm attempting to support expanding searches as such:

alias:Paul
(OOB) alias =  "name.*" => __JSON_PATH__:name.*

Which would get expanded to the obvious "name.first:Paul
name.last:Paul". I was planning on just doing a search for
__JSON_PATH__:name.* collecting the terms and doing an OR like what
the old Prefix/Range queries. (I think I read that RangeQueries are
different now, but right now I'm going for Make It Work)

So anyway, long story short, is there a way to get just the list of
terms that match an arbitrary lucene query?

I tried poking at Query.extractTerms(Set terms) but it appears to fall
down for RangeQueries. I contemplated telling the query parser to use
old range queries, but wasn't sure if a different type of query would
cause the same problems or not. I've started just creating unique
documents for each possible path, but that seems like a Bad Idea (TM).

Any thoughts on ways to completely rework the whole system would also
be welcome. I'm relative new to Lucene so I may be completely screwing
the pooch on something I'm not even aware of.

Thanks,
Paul Davis

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Terms Matching Query

Reply via email to