I'm working on indexing JSON documents via Lucene and I've run into a
bit of a snag. Currently, I'm indexing JSON documents by adding fields
that are path/value pairs. For example, given a JSON document like:
{
"name":
{
"first": "Paul",
"last": "Davis"
}
"jobs": ["hotdog vendor", "signboard guy"]
}
I build the doc with something like:
doc.add(new Field("/name/first", "Paul", ...))
doc.add(new Field("/name/last", "Davis", ...))
doc.add(new Field("/jobs/$0", "hotdog vendor", ...))
doc.add(new Field("/jobs/$1", "signboard guy", ...))
doc.add(new Field("__JSON_PATH__", "/name/first", ...))
doc.add(new Field("__JSON_PATH__", "/name/last", ...))
doc.add(new Field("__JSON_PATH__", "/jobs/$0", ...))
doc.add(new Field("__JSON_PATH__", "/jobs/$1", ...))
Now, at query time I'm attempting to support expanding searches as such:
alias:Paul
(OOB) alias = "name.*" => __JSON_PATH__:name.*
Which would get expanded to the obvious "name.first:Paul
name.last:Paul". I was planning on just doing a search for
__JSON_PATH__:name.* collecting the terms and doing an OR like what
the old Prefix/Range queries. (I think I read that RangeQueries are
different now, but right now I'm going for Make It Work)
So anyway, long story short, is there a way to get just the list of
terms that match an arbitrary lucene query?
I tried poking at Query.extractTerms(Set terms) but it appears to fall
down for RangeQueries. I contemplated telling the query parser to use
old range queries, but wasn't sure if a different type of query would
cause the same problems or not. I've started just creating unique
documents for each possible path, but that seems like a Bad Idea (TM).
Any thoughts on ways to completely rework the whole system would also
be welcome. I'm relative new to Lucene so I may be completely screwing
the pooch on something I'm not even aware of.
Thanks,
Paul Davis
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]