I’ve been trying to do semi-structured queries & query parsing. In other
words, you could have XML snippets mixed in with plain terms, e.g. a query like:
christmas tree <store loc=”abc” close_hour=”2200”>
where you’re looking for a document with the terms “christmas” “tree” but also
some structured data about where (practically) you could buy the tree.
Additionally, I’d like to be able to write functions relating multiple items,
sort of like predicate logic or database-like queries:
christmas tree NEARBY( <store close_hour=”2200”>, <restaurant
close_hour=”2400”> )
which would only find you places to buy a christmas tree that had stores and
restaurants in close proximity to each other. Finally, we would eventually be
interested in doing something similar to
org.apache.lucene.queries.CustomScoreQuery, where you can put in several
different criteria and weight them separately per document.
I’ve been poking around at a lot of places and would appreciate some help about
where I should extend, an existing walkthough or example, etc. Here’s what
I’ve been considering:
* org/apache/lucene/queryparser/flexible/standard/StandardQueryParser.java
— modifying this to add another group-like QueryNode, modifying the processor
pipeline to include this, modifying the definition of a TERM so it can deal
with attribute=”value” pairs in pseudo-xml. I read through the QueryParser
documentation but quickly got lost in the implementation.
* org/apache/lucene/queryparser/xml/CorePlusExtensionsParser.java — this
seems like it has to do a lot of what I want, but I can’t tell. I hadn’t
originally thought of the query coming in as an xml stream. I think I would
still need to define some new Query types... Perhaps a lot? One for each type
of thing (“store”, in the above) I’d search for?
Thanks!
stephen