On Sun, Aug 26, 2012 at 10:01 AM, Aaron McCurry <[email protected]> wrote: > On Sat, Aug 25, 2012 at 4:48 PM, Tim Tutt <[email protected]> wrote: >> Aaron, >> >> Just for a little clarification on your example, when you say JOIN, are you >> actually just talking about a union of two sets or are you actually >> referring to the relational type of join where the intent is to merge them >> into a single record? If it's the former, wouldn't a simple OR suffice? > > Well it's a little different in the Lucene world, but in essence it > would be the latter. However the result is not a single Record but > rather a Row that contains the 2 Records. > > Take a look at this link: > http://lucene.apache.org/core/3_6_1/api/contrib-join/org/apache/lucene/search/join/package-summary.html > > Blur uses the Index-time joins, but it's an internal piece of code. > Blur doesn't actually use this contrib although maybe it should. > >> >> Provided that I am in fact missing something, here are my thoughts on the >> query language: >> >> A common theme that I have seen across the board with commercial >> search/discovery products is the creation of a query language modeled after >> SQL with varying limitations. This tends to be fairly effective as the >> learning curve is not too steep for users who have experience writing SQL >> queries and dealing with relational databases. Additionally, these users >> normally find a way to live with the limitations of the language and find >> ways around the problems they are trying to solve as the language is >> typically advanced enough to be creative. >> >> Such a language, however, does not lend it self well to the less advanced >> end users of your product. Perhaps in certain cases this is acceptable as >> you will always have some advanced user available, but in the cases where >> these advanced users are in limited supply the learning curve becomes >> steeper as the technical ability and know-how decreases. > > I agree with your assessment of a SQL-like language, my fear in making > this the standard for all queries in Blur is the extra syntax the > language would require. For example: > > "select * from test_table where super = 'test';" > > But this really isn't correct because in sql this would mean an exact > match and you would have to index the data in several different ways > to make super = 'test' work. Instead it should be something like: > > "select * from test_table where super like 'test';" > > However in Lucene syntax and CQL it's just: > > "test" > > Also I like the separation of what to result from the query, as well > as where to start, how many to fetch, etc. > > Blur has a JDBC project, perhaps both can be used. We could use SQL > as a control language for passing what to select, sort by, etc and let > CQL be the query language.
While once a fan, I'd hope CQL isn't the answer. We'd lose field/index projections over boolean clauses and be limited to prox being a boolean operator - those aren't fixable without straying from the spec. The CQL spec peeps also seem disconnected from any implementation such that none of the later strictly resemble the former - and there appears little opportunity for implementations in the wild to actually inform the specification. So I like your Option1:) If we just extend lucene's syntax it gets over your biggest concern - though it does leave a *lot* of work to be done:( blurQuery ::= luceneQuery (havingClause)? (sortClause)? havingClause ::= 'HAVING' luceneQuery //not sure if this is a subset or not? sortClause ::= 'sortby' field Thanks, --tim
