Ahh, the lucene search time joins from the contrib
(http://lucene.apache.org/core/3_6_1/api/contrib-join/org/apache/lucene/search/join/package-summary.html)
is what I was thinking of. I like that style as a good interum upgrade
to support these record joins. How hard would it be to do something
like:
BlurQuery blurQuery = new BlurQuery();
blurQuery.simpleQuery = new SimpleQuery();
//blank since there's nothing other than the joins
blurQuery.simpleQuery.queryStr = "";
blurQuery.addJoin("+test-family.testcol1:value1");
blurQuery.addJoin(+test-family.testcol3:value234123");
blurQuery.simpleQuery.superQueryOn = true;
blurQuery.simpleQuery.type = ScoreType.SUPER;
blurQuery.fetch = 10;
blurQuery.minimumNumberOfResults = Long.MAX_VALUE;
blurQuery.maxQueryTime = Long.MAX_VALUE;
blurQuery.uuid = 1;
This way one would not have to guess at users intent. I think this is
a cleaner workaround than the ugly query until something nicer comes
along.
If one was to go down the either of the ?ql approaches I think the
default BlurResult should be the one from blur-jdbc and actually be
what one would expect from a database returning the same query. Not
sold on it, I always find sql very limiting.
On Sun, Aug 26, 2012 at 10:28 AM, Aaron McCurry <[email protected]> wrote:
> Look at IndexManagerTest.testQueryWithJoin
>
> I have attached an easier to read version of the data.
>
> And here is the query to "trick" it in to doing the join:
>
> +(+test-family.testcol1:value1 nojoin) +(+test-family.testcol3:value234123)
>
> On Sun, Aug 26, 2012 at 9:40 AM, Aaron McCurry <[email protected]> wrote:
>> I'm trying to create a real example working of the problem, and I have
>> found a bug. Basically the query never finishes, very strange. Not
>> sure if it's Lucene or my query at this point. Once I have an example
>> I will post it and add it as an unit test in the IndexManagerTest.
>>
>> Aaron
>>
>> On Sat, Aug 25, 2012 at 7:04 PM, Garrett Barton
>> <[email protected]> wrote:
>>> Can we get this test case working to show the problem?
>>>
>>> private static void testJoin(Iface client, String table) throws
>>> BlurException, TException {
>>> RowMutation mutation = new RowMutation();
>>> mutation.table = table;
>>> mutation.waitToBeVisible = true;
>>> mutation.rowId = "row1";
>>> mutation.addToRecordMutations(newRecordMutation("cf1",
>>> "recordid1", newColumn("col1","value1")));
>>> mutation.addToRecordMutations(newRecordMutation("cf1",
>>> "recordid2", newColumn("col2","value2")));
>>> mutation.rowMutationType = RowMutationType.REPLACE_ROW;
>>> client.mutate(mutation);
>>>
>>> List<String> joinTest = new ArrayList<String>();
>>> joinTest.add("+cf1.col1:value1");
>>> joinTest.add("+cf1.col2:value2");
>>> joinTest.add("+cf1.col1:value1 +cf1.col2:value2");
>>> joinTest.add("+(+cf1.col1:value1 nocf.nofield:somevalue)
>>> +(+cf1.col2.value2 nocf.nofield:somevalue)");
>>> joinTest.add("+(+cf1.col1:value1) +(cf1.bla:bla
>>> +cf1.col2.value2)");
>>>
>>> for(String q : joinTest)
>>> System.out.println(q + " hits: " +
>>> hits(client,table, q, true));
>>> }
>>>
>>> private static long hits(Iface client, String table, String
>>> queryStr,
>>> boolean superQuery) throws BlurException, TException {
>>> BlurQuery bq = new BlurQuery();
>>> SimpleQuery sq = new SimpleQuery();
>>> sq.queryStr = queryStr;
>>> sq.superQueryOn = superQuery;
>>> bq.simpleQuery = sq;
>>> BlurResults query = client.query(table, bq);
>>> return query.totalResults;
>>> }
>>>
>>>
>>> Running I get:
>>> +cf1.col1:value1 hits: 1
>>> +cf1.col2:value2 hits: 1
>>> +cf1.col1:value1 +cf1.col2:value2 hits: 0
>>> +(+cf1.col1:value1 nocf.nofield:somevalue) +(+cf1.col2.value2
>>> nocf.nofield:somevalue) hits: 0
>>> +(+cf1.col1:value1) +(cf1.bla:bla +cf1.col2.value2) hits: 0
>>>
>>> Whats the trick to get the join to work?
>>>
>>> Honestly my first instinct in to turn the record joins into a list
>>> passed in to the simple query if one wants to move into record joining
>>> vs default inter record joining of the same cf. Will ponder the other
>>> options some more. :)
>>>
>>> ~Garrett
>>>
>>> On Sat, Aug 25, 2012 at 4:48 PM, Tim Tutt <[email protected]> wrote:
>>>> Aaron,
>>>>
>>>> Just for a little clarification on your example, when you say JOIN, are you
>>>> actually just talking about a union of two sets or are you actually
>>>> referring to the relational type of join where the intent is to merge them
>>>> into a single record? If it's the former, wouldn't a simple OR suffice?
>>>>
>>>> Provided that I am in fact missing something, here are my thoughts on the
>>>> query language:
>>>>
>>>> A common theme that I have seen across the board with commercial
>>>> search/discovery products is the creation of a query language modeled after
>>>> SQL with varying limitations. This tends to be fairly effective as the
>>>> learning curve is not too steep for users who have experience writing SQL
>>>> queries and dealing with relational databases. Additionally, these users
>>>> normally find a way to live with the limitations of the language and find
>>>> ways around the problems they are trying to solve as the language is
>>>> typically advanced enough to be creative.
>>>>
>>>> Such a language, however, does not lend it self well to the less advanced
>>>> end users of your product. Perhaps in certain cases this is acceptable as
>>>> you will always have some advanced user available, but in the cases where
>>>> these advanced users are in limited supply the learning curve becomes
>>>> steeper as the technical ability and know-how decreases.
>>>>
>>>> In taking a brief look at the spec for CQL, I tend to agree with your
>>>> assessment that it is the best option as it looks like it has the ability
>>>> to be flexible enough to fit both cases. It is possible that you will run
>>>> into limitations with the queries that your more advanced users are
>>>> interested in, but perhaps those are the cases where Blur is not a fit.
>>>>
>>>>
>>>> Tim
>>>>
>>>> On Sat, Aug 25, 2012 at 2:49 PM, Aaron McCurry <[email protected]> wrote:
>>>>
>>>>> I would to start a thread on the topic of the future of Blur's query
>>>>> language. Currently the "simpleQuery" is just a normal Lucene based
>>>>> syntax with a little magic to figure out the joins (via the
>>>>> SuperQuery) that the user probably intended. Of course this guess
>>>>> work gets it wrong sometimes. Let me explain with an example:
>>>>>
>>>>> Given the query with superOn:
>>>>>
>>>>> +cf1.field1:value1 +cf1.field2.value2
>>>>>
>>>>> The current implementation will ASSUME that you want to find where
>>>>> "cf1.field1" contains "value1" and where "cf1.field2" contains
>>>>> "value2" in the same Record because the column family is the same.
>>>>> i.e. NO JOIN across records
>>>>>
>>>>> But perhaps the user really does want a join, meaning that the user
>>>>> wants to find any Row that contains one or more Records that have a
>>>>> field "cf1.field1" that contains "value1" and one or more Records in
>>>>> the same Row (but not necessarily in the same Record) that contains a
>>>>> field "cf1.field2" that contains "value2". i.e. JOIN
>>>>>
>>>>> Given that current implementation, the only way to force the JOIN is
>>>>> to do something like:
>>>>>
>>>>> +(+cf1.field1:value1 nocf.nofield:somevalue) +(+cf1.field2.value2
>>>>> nocf.nofield:somevalue)
>>>>>
>>>>> This will trick the parser into creating 2 separate join query
>>>>> (SuperQuery) objects and perform the JOIN.
>>>>>
>>>>>
>>>>> THIS IS UGLY.
>>>>>
>>>>> Here are the current criteria for a query language:
>>>>> - The ability to support any Lucene query type (Boolean, Term, Fuzzy,
>>>>> Span, etc.)
>>>>> - User defined query type should be supported, extensible
>>>>> - The query language should be compatible with any programming
>>>>> language so that the current thrift RPC can continue to be utilized
>>>>>
>>>>> Here are options that I have been thinking about:
>>>>>
>>>>> Option 1:
>>>>> Somehow extend the current Lucene Query syntax to support these "new"
>>>>> features. The biggest issue I have with this is that we would be
>>>>> creating yet another query language that users would have to learn.
>>>>> Also I think that allowing users to extend the query language by
>>>>> adding there own types would required a rewrite of the Lucene
>>>>> implemented query parser. So even starting with the Lucene query
>>>>> language would be a lot of work.
>>>>>
>>>>> Option 2:
>>>>> Some limited version of SQL or SQL like syntax, basically supporting
>>>>> normal SQL with limited join support (probably only natural joins).
>>>>> This would be nice, because most users understand SQL. But because
>>>>> Blur can not support all the various operations that SQL can provide
>>>>> this will probably be frustrating to users. And they will need to
>>>>> learn what Blur SQL will provide and any special Blur only syntax. So
>>>>> this would again be like inventing another query language.
>>>>>
>>>>> Option 3:
>>>>> CQL (http://en.wikipedia.org/wiki/Contextual_Query_Language) not to be
>>>>> confused with Cassandra Query Language. Currently I like this option
>>>>> the best, because it has built-in extensibility as well as the normal
>>>>> options needed for a search engine. Boolean, fuzzy, wildcard, etc.
>>>>>
>>>>> I really would like to get other's opinions here and any other options.
>>>>> Thanks!
>>>>>
>>>>> Aaron
>>>>>