I'm trying to create a real example working of the problem, and I have found a bug. Basically the query never finishes, very strange. Not sure if it's Lucene or my query at this point. Once I have an example I will post it and add it as an unit test in the IndexManagerTest.
Aaron On Sat, Aug 25, 2012 at 7:04 PM, Garrett Barton <[email protected]> wrote: > Can we get this test case working to show the problem? > > private static void testJoin(Iface client, String table) throws > BlurException, TException { > RowMutation mutation = new RowMutation(); > mutation.table = table; > mutation.waitToBeVisible = true; > mutation.rowId = "row1"; > mutation.addToRecordMutations(newRecordMutation("cf1", > "recordid1", newColumn("col1","value1"))); > mutation.addToRecordMutations(newRecordMutation("cf1", > "recordid2", newColumn("col2","value2"))); > mutation.rowMutationType = RowMutationType.REPLACE_ROW; > client.mutate(mutation); > > List<String> joinTest = new ArrayList<String>(); > joinTest.add("+cf1.col1:value1"); > joinTest.add("+cf1.col2:value2"); > joinTest.add("+cf1.col1:value1 +cf1.col2:value2"); > joinTest.add("+(+cf1.col1:value1 nocf.nofield:somevalue) > +(+cf1.col2.value2 nocf.nofield:somevalue)"); > joinTest.add("+(+cf1.col1:value1) +(cf1.bla:bla > +cf1.col2.value2)"); > > for(String q : joinTest) > System.out.println(q + " hits: " + hits(client,table, > q, true)); > } > > private static long hits(Iface client, String table, String queryStr, > boolean superQuery) throws BlurException, TException { > BlurQuery bq = new BlurQuery(); > SimpleQuery sq = new SimpleQuery(); > sq.queryStr = queryStr; > sq.superQueryOn = superQuery; > bq.simpleQuery = sq; > BlurResults query = client.query(table, bq); > return query.totalResults; > } > > > Running I get: > +cf1.col1:value1 hits: 1 > +cf1.col2:value2 hits: 1 > +cf1.col1:value1 +cf1.col2:value2 hits: 0 > +(+cf1.col1:value1 nocf.nofield:somevalue) +(+cf1.col2.value2 > nocf.nofield:somevalue) hits: 0 > +(+cf1.col1:value1) +(cf1.bla:bla +cf1.col2.value2) hits: 0 > > Whats the trick to get the join to work? > > Honestly my first instinct in to turn the record joins into a list > passed in to the simple query if one wants to move into record joining > vs default inter record joining of the same cf. Will ponder the other > options some more. :) > > ~Garrett > > On Sat, Aug 25, 2012 at 4:48 PM, Tim Tutt <[email protected]> wrote: >> Aaron, >> >> Just for a little clarification on your example, when you say JOIN, are you >> actually just talking about a union of two sets or are you actually >> referring to the relational type of join where the intent is to merge them >> into a single record? If it's the former, wouldn't a simple OR suffice? >> >> Provided that I am in fact missing something, here are my thoughts on the >> query language: >> >> A common theme that I have seen across the board with commercial >> search/discovery products is the creation of a query language modeled after >> SQL with varying limitations. This tends to be fairly effective as the >> learning curve is not too steep for users who have experience writing SQL >> queries and dealing with relational databases. Additionally, these users >> normally find a way to live with the limitations of the language and find >> ways around the problems they are trying to solve as the language is >> typically advanced enough to be creative. >> >> Such a language, however, does not lend it self well to the less advanced >> end users of your product. Perhaps in certain cases this is acceptable as >> you will always have some advanced user available, but in the cases where >> these advanced users are in limited supply the learning curve becomes >> steeper as the technical ability and know-how decreases. >> >> In taking a brief look at the spec for CQL, I tend to agree with your >> assessment that it is the best option as it looks like it has the ability >> to be flexible enough to fit both cases. It is possible that you will run >> into limitations with the queries that your more advanced users are >> interested in, but perhaps those are the cases where Blur is not a fit. >> >> >> Tim >> >> On Sat, Aug 25, 2012 at 2:49 PM, Aaron McCurry <[email protected]> wrote: >> >>> I would to start a thread on the topic of the future of Blur's query >>> language. Currently the "simpleQuery" is just a normal Lucene based >>> syntax with a little magic to figure out the joins (via the >>> SuperQuery) that the user probably intended. Of course this guess >>> work gets it wrong sometimes. Let me explain with an example: >>> >>> Given the query with superOn: >>> >>> +cf1.field1:value1 +cf1.field2.value2 >>> >>> The current implementation will ASSUME that you want to find where >>> "cf1.field1" contains "value1" and where "cf1.field2" contains >>> "value2" in the same Record because the column family is the same. >>> i.e. NO JOIN across records >>> >>> But perhaps the user really does want a join, meaning that the user >>> wants to find any Row that contains one or more Records that have a >>> field "cf1.field1" that contains "value1" and one or more Records in >>> the same Row (but not necessarily in the same Record) that contains a >>> field "cf1.field2" that contains "value2". i.e. JOIN >>> >>> Given that current implementation, the only way to force the JOIN is >>> to do something like: >>> >>> +(+cf1.field1:value1 nocf.nofield:somevalue) +(+cf1.field2.value2 >>> nocf.nofield:somevalue) >>> >>> This will trick the parser into creating 2 separate join query >>> (SuperQuery) objects and perform the JOIN. >>> >>> >>> THIS IS UGLY. >>> >>> Here are the current criteria for a query language: >>> - The ability to support any Lucene query type (Boolean, Term, Fuzzy, >>> Span, etc.) >>> - User defined query type should be supported, extensible >>> - The query language should be compatible with any programming >>> language so that the current thrift RPC can continue to be utilized >>> >>> Here are options that I have been thinking about: >>> >>> Option 1: >>> Somehow extend the current Lucene Query syntax to support these "new" >>> features. The biggest issue I have with this is that we would be >>> creating yet another query language that users would have to learn. >>> Also I think that allowing users to extend the query language by >>> adding there own types would required a rewrite of the Lucene >>> implemented query parser. So even starting with the Lucene query >>> language would be a lot of work. >>> >>> Option 2: >>> Some limited version of SQL or SQL like syntax, basically supporting >>> normal SQL with limited join support (probably only natural joins). >>> This would be nice, because most users understand SQL. But because >>> Blur can not support all the various operations that SQL can provide >>> this will probably be frustrating to users. And they will need to >>> learn what Blur SQL will provide and any special Blur only syntax. So >>> this would again be like inventing another query language. >>> >>> Option 3: >>> CQL (http://en.wikipedia.org/wiki/Contextual_Query_Language) not to be >>> confused with Cassandra Query Language. Currently I like this option >>> the best, because it has built-in extensibility as well as the normal >>> options needed for a search engine. Boolean, fuzzy, wildcard, etc. >>> >>> I really would like to get other's opinions here and any other options. >>> Thanks! >>> >>> Aaron >>>
