Erik Hatcher wrote:
On Jun 9, 2004, at 12:21 PM, David Spencer wrote:
show us that most folks query with 1 - 3 words and do not use the any of the advanced features.
But with automagic query expansion these things might be done behind the scenes. Nutch, for one, expands simple queries to check against multiple fields, with different boosts, and even gives a bonus for terms that are near each other.
Ah yes! Don't worry, I hadn't forgotten about Nutch. I'm tinkering with its query parsing and analysis as we speak in fact. Very clever indeed.
The elegance of the query syntax is quite important, and QueryParser has gotten a bit hairy. I would enjoy discussions on creating new query parsers (one size doesn't fit all, I don't think) and what syntax
I suggested in some email a while ago making the QueryParser extensible at, runtime or startup time, so you can add other types if queries that it doesn't support - so you have a way of registering these other query types (SpanQuery, SubstringQuery etc) and then some syntax like "span:foo" to invoke the query expander registered w/ "span" on "foo"...
I would be curious to see how an implementation of this played out. For example, could I add my own syntax such that
"some phrase" <-3-> "another phrase"
could be parsed into a SpanNearQuery of two SpanNearQuery's?
I like the idea of a flexible run-time grammar, but it sounds too good to be true in a general purpose kinda way.
My idea isn't perfect for humans, but at least lets you use queries not hard coded.
You have something like
[1] how you register, could be in existing QueryParser
void register( String name, SubqueryParser qp)
[2] what you register
interface SubQueryParser
{
Query parse( String s); // parses string user enters, forms a Query...
}[3] example of registration
register( "substring", new SubstringQP()); // instead of prefix matches allows term anywhere
register( "span", new SurroundQP());
register( "syn", new SynonymExpanderQP()); // expands a word to include synonyms
[4] syntax
normal query parser syntax but add something else like "NAME::TEXT" (note 2 colons) so
this: "black syn::bird"
expands to calls in the new extensible query parser, something like
BooleanQuery bq = ... bq.add( new TermQuery( "contents", "black")) bq.add( SubstringParser.parse( "bird")) // really SynonymExpanderQP return bq
behind the scenes SynonymExpanderQP expanded "bird" to the query equivalent of, um, "bird avian^.5 wingedanimal^.5" or whatnot.
[5] the point
Be backward compatible and "natural" for existing query syntax, but leave a hook so that if you innovate and define new query expansion code there's some hope of someone using it as they can in theory drop it in and use it w/o coding. Right now if you create some code in this area I suspect there's little chance people will try it out as there's too much friction to try it out.
Erik
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
