Look at IndexManagerTest.testQueryWithJoin I have attached an easier to read version of the data.
And here is the query to "trick" it in to doing the join: +(+test-family.testcol1:value1 nojoin) +(+test-family.testcol3:value234123) On Sun, Aug 26, 2012 at 9:40 AM, Aaron McCurry <[email protected]> wrote: > I'm trying to create a real example working of the problem, and I have > found a bug. Basically the query never finishes, very strange. Not > sure if it's Lucene or my query at this point. Once I have an example > I will post it and add it as an unit test in the IndexManagerTest. > > Aaron > > On Sat, Aug 25, 2012 at 7:04 PM, Garrett Barton > <[email protected]> wrote: >> Can we get this test case working to show the problem? >> >> private static void testJoin(Iface client, String table) throws >> BlurException, TException { >> RowMutation mutation = new RowMutation(); >> mutation.table = table; >> mutation.waitToBeVisible = true; >> mutation.rowId = "row1"; >> mutation.addToRecordMutations(newRecordMutation("cf1", >> "recordid1", newColumn("col1","value1"))); >> mutation.addToRecordMutations(newRecordMutation("cf1", >> "recordid2", newColumn("col2","value2"))); >> mutation.rowMutationType = RowMutationType.REPLACE_ROW; >> client.mutate(mutation); >> >> List<String> joinTest = new ArrayList<String>(); >> joinTest.add("+cf1.col1:value1"); >> joinTest.add("+cf1.col2:value2"); >> joinTest.add("+cf1.col1:value1 +cf1.col2:value2"); >> joinTest.add("+(+cf1.col1:value1 nocf.nofield:somevalue) >> +(+cf1.col2.value2 nocf.nofield:somevalue)"); >> joinTest.add("+(+cf1.col1:value1) +(cf1.bla:bla >> +cf1.col2.value2)"); >> >> for(String q : joinTest) >> System.out.println(q + " hits: " + >> hits(client,table, q, true)); >> } >> >> private static long hits(Iface client, String table, String queryStr, >> boolean superQuery) throws BlurException, TException { >> BlurQuery bq = new BlurQuery(); >> SimpleQuery sq = new SimpleQuery(); >> sq.queryStr = queryStr; >> sq.superQueryOn = superQuery; >> bq.simpleQuery = sq; >> BlurResults query = client.query(table, bq); >> return query.totalResults; >> } >> >> >> Running I get: >> +cf1.col1:value1 hits: 1 >> +cf1.col2:value2 hits: 1 >> +cf1.col1:value1 +cf1.col2:value2 hits: 0 >> +(+cf1.col1:value1 nocf.nofield:somevalue) +(+cf1.col2.value2 >> nocf.nofield:somevalue) hits: 0 >> +(+cf1.col1:value1) +(cf1.bla:bla +cf1.col2.value2) hits: 0 >> >> Whats the trick to get the join to work? >> >> Honestly my first instinct in to turn the record joins into a list >> passed in to the simple query if one wants to move into record joining >> vs default inter record joining of the same cf. Will ponder the other >> options some more. :) >> >> ~Garrett >> >> On Sat, Aug 25, 2012 at 4:48 PM, Tim Tutt <[email protected]> wrote: >>> Aaron, >>> >>> Just for a little clarification on your example, when you say JOIN, are you >>> actually just talking about a union of two sets or are you actually >>> referring to the relational type of join where the intent is to merge them >>> into a single record? If it's the former, wouldn't a simple OR suffice? >>> >>> Provided that I am in fact missing something, here are my thoughts on the >>> query language: >>> >>> A common theme that I have seen across the board with commercial >>> search/discovery products is the creation of a query language modeled after >>> SQL with varying limitations. This tends to be fairly effective as the >>> learning curve is not too steep for users who have experience writing SQL >>> queries and dealing with relational databases. Additionally, these users >>> normally find a way to live with the limitations of the language and find >>> ways around the problems they are trying to solve as the language is >>> typically advanced enough to be creative. >>> >>> Such a language, however, does not lend it self well to the less advanced >>> end users of your product. Perhaps in certain cases this is acceptable as >>> you will always have some advanced user available, but in the cases where >>> these advanced users are in limited supply the learning curve becomes >>> steeper as the technical ability and know-how decreases. >>> >>> In taking a brief look at the spec for CQL, I tend to agree with your >>> assessment that it is the best option as it looks like it has the ability >>> to be flexible enough to fit both cases. It is possible that you will run >>> into limitations with the queries that your more advanced users are >>> interested in, but perhaps those are the cases where Blur is not a fit. >>> >>> >>> Tim >>> >>> On Sat, Aug 25, 2012 at 2:49 PM, Aaron McCurry <[email protected]> wrote: >>> >>>> I would to start a thread on the topic of the future of Blur's query >>>> language. Currently the "simpleQuery" is just a normal Lucene based >>>> syntax with a little magic to figure out the joins (via the >>>> SuperQuery) that the user probably intended. Of course this guess >>>> work gets it wrong sometimes. Let me explain with an example: >>>> >>>> Given the query with superOn: >>>> >>>> +cf1.field1:value1 +cf1.field2.value2 >>>> >>>> The current implementation will ASSUME that you want to find where >>>> "cf1.field1" contains "value1" and where "cf1.field2" contains >>>> "value2" in the same Record because the column family is the same. >>>> i.e. NO JOIN across records >>>> >>>> But perhaps the user really does want a join, meaning that the user >>>> wants to find any Row that contains one or more Records that have a >>>> field "cf1.field1" that contains "value1" and one or more Records in >>>> the same Row (but not necessarily in the same Record) that contains a >>>> field "cf1.field2" that contains "value2". i.e. JOIN >>>> >>>> Given that current implementation, the only way to force the JOIN is >>>> to do something like: >>>> >>>> +(+cf1.field1:value1 nocf.nofield:somevalue) +(+cf1.field2.value2 >>>> nocf.nofield:somevalue) >>>> >>>> This will trick the parser into creating 2 separate join query >>>> (SuperQuery) objects and perform the JOIN. >>>> >>>> >>>> THIS IS UGLY. >>>> >>>> Here are the current criteria for a query language: >>>> - The ability to support any Lucene query type (Boolean, Term, Fuzzy, >>>> Span, etc.) >>>> - User defined query type should be supported, extensible >>>> - The query language should be compatible with any programming >>>> language so that the current thrift RPC can continue to be utilized >>>> >>>> Here are options that I have been thinking about: >>>> >>>> Option 1: >>>> Somehow extend the current Lucene Query syntax to support these "new" >>>> features. The biggest issue I have with this is that we would be >>>> creating yet another query language that users would have to learn. >>>> Also I think that allowing users to extend the query language by >>>> adding there own types would required a rewrite of the Lucene >>>> implemented query parser. So even starting with the Lucene query >>>> language would be a lot of work. >>>> >>>> Option 2: >>>> Some limited version of SQL or SQL like syntax, basically supporting >>>> normal SQL with limited join support (probably only natural joins). >>>> This would be nice, because most users understand SQL. But because >>>> Blur can not support all the various operations that SQL can provide >>>> this will probably be frustrating to users. And they will need to >>>> learn what Blur SQL will provide and any special Blur only syntax. So >>>> this would again be like inventing another query language. >>>> >>>> Option 3: >>>> CQL (http://en.wikipedia.org/wiki/Contextual_Query_Language) not to be >>>> confused with Cassandra Query Language. Currently I like this option >>>> the best, because it has built-in extensibility as well as the normal >>>> options needed for a search engine. Boolean, fuzzy, wildcard, etc. >>>> >>>> I really would like to get other's opinions here and any other options. >>>> Thanks! >>>> >>>> Aaron >>>>
row-1,test-family,record-1, testcol1:value1, testcol2:value2, testcol3:value3 row-2,test-family,record-2, testcol1;value4, testcol2:value5, testcol3:value6 row-2,test-family,record-2B, testcol2:value234123, testcol3:value234123 row-3,test-family,record-3, testcol1:value7, testcol2:value8, testcol3:value9 row-4,test-family,record-4, testcol1:value1, testcol2:value5, testcol3:value9 row-4,test-family,record-4B, testcol2:value234123, testcol3:value234123 row-5,test-family,record-5A,testcol1:value13, testcol2:value14, testcol3:value15 row-4,test-family,record-5B,testcol1:value16, testcol2:value17, testcol3:value18,value19
