Re: Future of Blur Query Language

Tim Williams Tue, 28 Aug 2012 19:45:31 -0700

On Sun, Aug 26, 2012 at 10:01 AM, Aaron McCurry <[email protected]> wrote:
> On Sat, Aug 25, 2012 at 4:48 PM, Tim Tutt <[email protected]> wrote:
>> Aaron,
>>
>> Just for a little clarification on your example, when you say JOIN, are you
>> actually just talking about a union of two sets or are you actually
>> referring to the relational type of join where the intent is to merge them
>> into a single record? If it's the former, wouldn't a simple OR suffice?
>
> Well it's a little different in the Lucene world, but in essence it
> would be the latter.  However the result is not a single Record but
> rather a Row that contains the 2 Records.
>
> Take a look at this link:
> http://lucene.apache.org/core/3_6_1/api/contrib-join/org/apache/lucene/search/join/package-summary.html
>
> Blur uses the Index-time joins, but it's an internal piece of code.
> Blur doesn't actually use this contrib although maybe it should.
>
>>
>> Provided that I am in fact missing something, here are my thoughts on the
>> query language:
>>
>> A common theme that I have seen across the board with commercial
>> search/discovery products is the creation of a query language modeled after
>> SQL with varying limitations. This tends to be fairly effective as the
>> learning curve is not too steep for users who have experience writing SQL
>> queries and dealing with relational databases. Additionally, these users
>> normally find a way to live with the limitations of the language and find
>> ways around the problems they are trying to solve as the language is
>> typically advanced enough to be creative.
>>
>> Such a language, however, does not lend it self well to the less advanced
>> end users of your product. Perhaps in certain cases this is acceptable as
>> you will always have some advanced user available, but in the cases where
>> these advanced users are in limited supply the learning curve becomes
>> steeper as the technical ability and know-how decreases.
>
> I agree with your assessment of a SQL-like language, my fear in making
> this the standard for all queries in Blur is the extra syntax the
> language would require.  For example:
>
> "select * from test_table where super = 'test';"
>
> But this really isn't correct because in sql this would mean an exact
> match and you would have to index the data in several different ways
> to make super = 'test' work.  Instead it should be something like:
>
> "select * from test_table where super like 'test';"
>
> However in Lucene syntax and CQL it's just:
>
> "test"
>
> Also I like the separation of what to result from the query, as well
> as where to start, how many to fetch, etc.
>
> Blur has a JDBC project, perhaps both can be used.  We could use SQL
> as a control language for passing what to select, sort by, etc and let
> CQL be the query language.


While once a fan, I'd hope CQL isn't the answer.  We'd lose
field/index projections over boolean clauses and be limited to prox
being a boolean operator - those aren't fixable without straying from
the spec.  The CQL spec peeps also seem disconnected from any
implementation such that none of the later strictly resemble the
former - and there appears little opportunity for implementations in
the wild to actually inform the specification.

So I like your Option1:)  If we just extend lucene's syntax it gets
over your biggest concern - though it does leave a *lot* of work to be
done:(

blurQuery ::= luceneQuery (havingClause)? (sortClause)?

havingClause ::= 'HAVING' luceneQuery //not sure if this is a subset or not?

sortClause ::= 'sortby' field

Thanks,
--tim

Re: Future of Blur Query Language

Reply via email to