On Dec 2, 2005, at 10:03 AM, mark harwood wrote:
There seems to be a growing gap between Lucene
functionality and the query language offered by
QueryParser (eg no support for regex queries, span
queries, "more like this", filter queries,
minNumShouldMatch etc etc).

At least with a couple of these it would be sensible to subclass QueryParser and override some getters to create other types of queries. For example, if you need ordered sloppy phrase queries you could create a SpanNearQuery instead of a PhraseQuery. Likewise with RegexQuery instead of WildcardQuery.

Question - since when is "more like this" a Query?  Should it be?

Your points below are well taken though....

Closing this gap is hard when:
a) The availability of Javacc+Lucene skills is a
bottleneck

job security?!  :)

I've been doing a lot of JavaCC work this year, and it has been a humbling learning curve, and I barely feel capable with it.

One interesting project I just came across is JParsec: http:// jparsec.codehaus.org - perhaps this could be a much simpler way than using JavaCC.

b) The syntax of the query language makes it difficult
to add new features eg rapidly running out of "special
characters"

This is the biggest issue of all. What do humans want to type in in order to achieve sophisticated queries?

Apple has it pretty nicely implemented with additive builders (such as with Finder, Mail rules, and smart playlists in iTunes) but they don't support nested expressions rather only "all" or "any" of the criteria.

I don't think extending the existing query
parser/language is necessarily useful and I see it
being used purely to support the classic "simple
search engine" syntax.

I concur. Tacking more into QueryParser is not going to make most users happy. I think there may be too many bells and whistles in it already.

Unfortunately the fall-back position for applications
which require more complex queries is to "just write
some Java code to instantiate the Query objects
programmatically."

I've not found a generalization of how queries are entered into the system across the applications I've worked on, though. Every query interface has been custom.

This is OK but I think there is
value in having an advanced search syntax capable of
supporting the latest Lucene features and expressed in
XML. It's worth considering why it's useful to have a
String-representable form for queries:
1) Queries can be stored eg in audit logs or "saved
queries" used for tasks like auto-categorization
2) Clients built in languages other than Java can
issue queries to a Lucene server
3) I can decouple a request from the code that
implements the query when distributing software e.g my
applet may not want Lucene dragging down to the client

This is an interesting proposal, and one that has a lot of merit in how you've explained it.

We can potentially use XML in the same way ANT does
i.e. a declarative way of invoking an extensible list
of Java-implemented features.

I've told many developers that the answer to almost all Java questions lies within the source code to Ant :)

A query interpreter is
used to instantiate the configured Java Query objects
and populates them with settings from the XML in a
generic fashion (using reflection) eg:
....
   <MoreLikeThis minNumberShouldMatch="3"
maxQueryTerms="30">

We're back to MoreLikeThis - it's not currently a Query subclass. How do you envision this sort of thing fitting in if it's not a Query?

Do people feel this would be a worthwhile endeavour?

I think a way to get a query to/from XML is a good one. Perhaps the XML serialization feature of JDK 1.4 (or is it 1.5?) is sufficient for this? Maybe not though - and there are plenty of handy helpers from just doing raw reflection tricks like Ant, to using something like Digester or Castor. I wouldn't recommend reinventing the XML de/ serialization aspect of this.

I'm not sure if enough people feel pain around the
points 1-3 outlined above to make it worth pursuing.

I don't see where I would use this capability just yet, but I do see it as useful in the contexts you provided.

I'd also be interested in effort towards an Apple-like query builder.

        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to