[jira] [Commented] (SOLR-3076) Solr should support block joins

Hoss Man (Commented) (JIRA) Thu, 09 Feb 2012 18:10:25 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205156#comment-13205156
 ]


Hoss Man commented on SOLR-3076:
--------------------------------

bq. Maybe there can be field aliases? Eg, book_page_count:[0 to 1000] and 
chapter_page_count[10:40], and the QP is told to map book_page_count -> 
parent:size and chapter_page_count -> child:size? Or maybe we let the user 
explicitly scope the field, eg chapter:size, book:size, book:title, etc. Not 
sure...

Hmmm... i kind of understand what you're saying; but the part i'm not 
understanding is even if you had field aliasing like that, given some query 
string like... 
{code}
  book_page_count:[0 TO 1000] and chapter_page_count[10 TO 40]
{code}
..how would the parser know whether the user was asking for the results to be 
"book documents" matching that criteria (1-1000 pages and containing at least 
one chapter child containing 10-40 pages), or "chapter documents" matching that 
criteria (10-40 pages contained in a book of 1-1000 pages) or "page documents" 
(all pages in containing in a chapter of 10-40 total pages, contained in a book 
of 1-1000 total pages) ?

I mean: it seems possible, and a QParser like that could totally support 
configuring those types of file mappings / hierarchy definitions in init 
params, but perhaps we should focus on the more user explicit, direct mapping 
type QParser type approach Mikhail has already started on for now, and consider 
that as an enhancement later?  (especially since it's not clear how the 
indexing side will be managed/enforced -- depending on how that shapes up, it 
might be possible for a QParser like you're describing, or perhaps _all_ 
QParsers to infer the field rules from the schema or some other configuration)

I think the syntax in Mikhail's BlockJoinParentQParserPlugin looks great as a 
straight forward baseline implementation.  The one straw man suggestion i might 
toss out there for consideration would be to invert the use of the "filter" and 
"v" local params, so instead of...

{code}
{!parent filter="parent:true"}child_name:b
{!parent filter="parent:true"}
{code}

...it might be...

{code}
{!parent of="child_names:b"}parent:true
{!parent}parent:true
{code}

...people may find that easier to read as a way to understand that the final 
query will return "parent documents" constraint such that those parent 
documents have children matching the "of" query.  The one thing i don't like 
this "of" idea is that (compared to the "filter" param Mikhail uses) it might 
be more tempting for people to use something like...

{code}
// WRONG! (i think)
q={!parent of="child_names:b"}some_parent_field:foo
{code}

...when they mean to write something like this...

{code}
q={!parent of="child_names:b"}some_query_that_identifies_the_set_of_all_parents
fq=some_parent_field:foo
{code}

...because as i understand it, it's important for the "parentFilter" to 
identify *all* of the parent documents, even ones you may not want returned, so 
that the ToParentBlockJoinQuery knows how to identify the parent of each 
document (correct?)

This type of user confusion is still possible with the syntax Mikhail's got, 
but i suspect it will be less likely --- In any case, i wanted to put the idea 
out there.

Given McCandless supposition that the parent/child relationships are likely to 
be very consistent, not very deep, and not vary from query to query, one thing 
we could do to to help mitigate this possible confusion would be:
 * make the "filter" param name much longer and verbose, ie: 
{{setOfAllParentsQuery}}
 * make the param optional, and have it default to something specified as an 
init param, ie: {{defaultSetOfAllParentsQuery}}
 * make the init param mandatory

That way, in the common case people will configure things like...

{code}
<queryParser name="parent" class="solr.BlockJoinParentQParserPlugin">
  <str name="defaultSetOfAllParentsQuery">type:parent</str>
</queryParser>
{code}

..and their queries will be simple...

{code}
q={!parent}              (all parent docs)
q={!parent}foo:bar       (all parent docss that contain kid docs matching 
foo:bar)
{code}

...but it will still be possible for people with more complex usecases with do 
more complex things.

Mikhail: some other minor feedback on the parts i understood of your patch that 
i understood (note: my lack of understanding is not a fault of your patch, it's 
just that most of the block join stuff is very foreign to me)...

* please prune down "solrconfig-bjqparser.xml" so it contains only the absolute 
minimum things you need for the test case, it makes it a lot easier for people 
to review the patch, and for users to understand what is necessary to utilize 
features demoed in the test (we have a lot of old bloaded solrconfig files i 
nthe test dir, but we're trying to stop doing that)
* the test would be a bit easier to follow if you used different letters for 
the parent fields vs the child fields (abcdef, vs xyz for example)
* it would be good to have tests verifying that nested parent queries work as 
expected, ie: that something like this works...
{code}
q={!parent filter="type:book" v=$chapters}
chapters=+chapter_title:Solr +_query_:{!parent filter="type:chapter" v=$pages}
pages=page_body:BlockJoin
{code} 
* it would be good to have your tests introspect the cache after doing the 
query to make sure the number of inserts, lookups, and hits match what you 
expect.

...but like i said: all in all i think it's really good.
                
> Solr should support block joins
> -------------------------------
>
>                 Key: SOLR-3076
>                 URL: https://issues.apache.org/jira/browse/SOLR-3076
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Grant Ingersoll
>         Attachments: SOLR-3076.patch, bjq-vs-filters-backward-disi.patch, 
> bjq-vs-filters-illegal-state.patch, parent-bjq-qparser.patch, 
> parent-bjq-qparser.patch, solrconf-bjq-erschema-snippet.xml
>
>
> Lucene has the ability to do block joins, we should add it to Solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-3076) Solr should support block joins

Reply via email to