Re: Future of Blur Query Language

Aaron McCurry Sun, 26 Aug 2012 06:40:51 -0700

I'm trying to create a real example working of the problem, and I have
found a bug.  Basically the query never finishes, very strange.  Not
sure if it's Lucene or my query at this point.  Once I have an example
I will post it and add it as an unit test in the IndexManagerTest.


Aaron

On Sat, Aug 25, 2012 at 7:04 PM, Garrett Barton
<[email protected]> wrote:
> Can we get this test case working to show the problem?
>
>         private static void testJoin(Iface client, String table) throws
> BlurException, TException {
>                 RowMutation mutation = new RowMutation();
>                 mutation.table = table;
>                 mutation.waitToBeVisible = true;
>                 mutation.rowId = "row1";
>                 mutation.addToRecordMutations(newRecordMutation("cf1",
>                                 "recordid1", newColumn("col1","value1")));
>                 mutation.addToRecordMutations(newRecordMutation("cf1",
>                                 "recordid2", newColumn("col2","value2")));
>                 mutation.rowMutationType = RowMutationType.REPLACE_ROW;
>                 client.mutate(mutation);
>
>                 List<String> joinTest = new ArrayList<String>();
>                 joinTest.add("+cf1.col1:value1");
>                 joinTest.add("+cf1.col2:value2");
>                 joinTest.add("+cf1.col1:value1 +cf1.col2:value2");
>                 joinTest.add("+(+cf1.col1:value1 nocf.nofield:somevalue)
> +(+cf1.col2.value2 nocf.nofield:somevalue)");
>                 joinTest.add("+(+cf1.col1:value1) +(cf1.bla:bla 
> +cf1.col2.value2)");
>
>                 for(String q : joinTest)
>                         System.out.println(q + " hits: " + hits(client,table, 
> q, true));
>         }
>
>         private static long hits(Iface client, String table, String queryStr,
> boolean superQuery) throws BlurException, TException {
>                 BlurQuery bq = new BlurQuery();
>                 SimpleQuery sq = new SimpleQuery();
>                 sq.queryStr = queryStr;
>                 sq.superQueryOn = superQuery;
>                 bq.simpleQuery = sq;
>                 BlurResults query = client.query(table, bq);
>                 return query.totalResults;
>         }
>
>
> Running I get:
> +cf1.col1:value1 hits: 1
> +cf1.col2:value2 hits: 1
> +cf1.col1:value1 +cf1.col2:value2 hits: 0
> +(+cf1.col1:value1 nocf.nofield:somevalue) +(+cf1.col2.value2
> nocf.nofield:somevalue) hits: 0
> +(+cf1.col1:value1) +(cf1.bla:bla +cf1.col2.value2) hits: 0
>
> Whats the trick to get the join to work?
>
> Honestly my first instinct in to turn the record joins into a list
> passed in to the simple query if one wants to move into record joining
> vs default inter record joining of the same cf.  Will ponder the other
> options some more. :)
>
> ~Garrett
>
> On Sat, Aug 25, 2012 at 4:48 PM, Tim Tutt <[email protected]> wrote:
>> Aaron,
>>
>> Just for a little clarification on your example, when you say JOIN, are you
>> actually just talking about a union of two sets or are you actually
>> referring to the relational type of join where the intent is to merge them
>> into a single record? If it's the former, wouldn't a simple OR suffice?
>>
>> Provided that I am in fact missing something, here are my thoughts on the
>> query language:
>>
>> A common theme that I have seen across the board with commercial
>> search/discovery products is the creation of a query language modeled after
>> SQL with varying limitations. This tends to be fairly effective as the
>> learning curve is not too steep for users who have experience writing SQL
>> queries and dealing with relational databases. Additionally, these users
>> normally find a way to live with the limitations of the language and find
>> ways around the problems they are trying to solve as the language is
>> typically advanced enough to be creative.
>>
>> Such a language, however, does not lend it self well to the less advanced
>> end users of your product. Perhaps in certain cases this is acceptable as
>> you will always have some advanced user available, but in the cases where
>> these advanced users are in limited supply the learning curve becomes
>> steeper as the technical ability and know-how decreases.
>>
>> In taking a brief look at the spec for CQL, I tend to agree with your
>> assessment that it is the best option as it looks like it has the ability
>> to be flexible enough to fit both cases. It is possible that you will run
>> into limitations with the queries that your more advanced users are
>> interested in, but perhaps those are the cases where Blur is not a fit.
>>
>>
>> Tim
>>
>> On Sat, Aug 25, 2012 at 2:49 PM, Aaron McCurry <[email protected]> wrote:
>>
>>> I would to start a thread on the topic of the future of Blur's query
>>> language.  Currently the "simpleQuery" is just a normal Lucene based
>>> syntax with a little magic to figure out the joins (via the
>>> SuperQuery) that the user probably intended.  Of course this guess
>>> work gets it wrong sometimes.  Let me explain with an example:
>>>
>>> Given the query with superOn:
>>>
>>> +cf1.field1:value1 +cf1.field2.value2
>>>
>>> The current implementation will ASSUME that you want to find where
>>> "cf1.field1" contains "value1" and where "cf1.field2" contains
>>> "value2" in the same Record because the column family is the same.
>>> i.e. NO JOIN across records
>>>
>>> But perhaps the user really does want a join, meaning that the user
>>> wants to find any Row that contains one or more Records that have a
>>> field "cf1.field1" that contains "value1" and one or more Records in
>>> the same Row (but not necessarily in the same Record) that contains a
>>> field "cf1.field2" that contains "value2".  i.e. JOIN
>>>
>>> Given that current implementation, the only way to force the JOIN is
>>> to do something like:
>>>
>>> +(+cf1.field1:value1 nocf.nofield:somevalue) +(+cf1.field2.value2
>>> nocf.nofield:somevalue)
>>>
>>> This will trick the parser into creating 2 separate join query
>>> (SuperQuery) objects and perform the JOIN.
>>>
>>>
>>> THIS IS UGLY.
>>>
>>> Here are the current criteria for a query language:
>>> - The ability to support any Lucene query type (Boolean, Term, Fuzzy,
>>> Span, etc.)
>>> - User defined query type should be supported, extensible
>>> - The query language should be compatible with any programming
>>> language so that the current thrift RPC can continue to be utilized
>>>
>>> Here are options that I have been thinking about:
>>>
>>> Option 1:
>>> Somehow extend the current Lucene Query syntax to support these "new"
>>> features.  The biggest issue I have with this is that we would be
>>> creating yet another query language that users would have to learn.
>>> Also I think that allowing users to extend the query language by
>>> adding there own types would required a rewrite of the Lucene
>>> implemented query parser.  So even starting with the Lucene query
>>> language would be a lot of work.
>>>
>>> Option 2:
>>> Some limited version of SQL or SQL like syntax, basically supporting
>>> normal SQL with limited join support (probably only natural joins).
>>> This would be nice, because most users understand SQL.  But because
>>> Blur can not support all the various operations that SQL can provide
>>> this will probably be frustrating to users.  And they will need to
>>> learn what Blur SQL will provide and any special Blur only syntax.  So
>>> this would again be like inventing another query language.
>>>
>>> Option 3:
>>> CQL (http://en.wikipedia.org/wiki/Contextual_Query_Language) not to be
>>> confused with Cassandra Query Language.  Currently I like this option
>>> the best, because it has built-in extensibility as well as the normal
>>> options needed for a search engine.  Boolean, fuzzy, wildcard, etc.
>>>
>>> I really would like to get other's opinions here and any other options.
>>>  Thanks!
>>>
>>> Aaron
>>>

Re: Future of Blur Query Language

Reply via email to