[jira] [Commented] (SOLR-12519) Support Deeply Nested Docs In Child Documents Transformer

David Smiley (JIRA) Tue, 28 Aug 2018 12:08:15 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-12519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595456#comment-16595456
 ]


David Smiley commented on SOLR-12519:
-------------------------------------

As I start to write out the notes on the change in semantics of "limit", and 
look back at the test, I think the limit interpretation is actually worse now.  
My bad (palm to face!). 

A documents's children come first and are left-to-right (low to high).  It's 
the intermediate parents that get placed after, and so it's not quite as simple 
as strictly left-right or right-left when wanting an ideal "limit".  I don't 
think the semantics of "limit" should be changed for existing users; there is 
no path metadata and we might as well start at the lowest.  For a simple flat 
list of child docs, it's the right thing to do.

 (made up syntax of a nested docA with some nested children)
{noformat}
docA:{ docB, docC:{ docC.1, docC.2}, docD}
{noformat}
Will get serialized/flattened like so:
{noformat}
docB, docC.1, docC.2, docC, docD, docA
{noformat}

Lets say we match all child docs (not filtered).
Consider a limit of 1.  Arguably, docB ought to be the sole child added.  
That's what happens currently, but soon will be docD.  :-/
Consdier a limit of 2.  Arguably, docB then docC ought to be added. That's 
_not_ what happens currently (docB & docC.1), and soon won't do that either 
(docC & docD).  But since we have the metadata, we are in a position to do it 
right.

Disclaimer: I didn't test-out the above; it's all from intuition.

It's kinda embarrassing we didn't see this after discussing it a bit and 
"correcting" tests.  Maybe the testing methodology doesn't make this 
in-your-face enough?  I've advocated before about the virtues of testing an 
entire document structure as a string because all is laid bare to see -- it's 
very _direct_; less to think about.  This goes hand-in-hand with indexing a 
simple document literally in the same test method as the test, instead of 
algorithmically generating documents (perhaps complex ones) in some other 
method.  There are certainly pros/cons both ways.

What might the fix be?  I think we should loop from the lowest docID underneath 
the parent (as it was before).  And as we go, we can accumulate a counter of 
how many docs have been added.  If we've reached that counter, then from that 
point forward, we only want intermediate docs to already-accumulated docs (i.e. 
only collect ancestors).  The actual number of docs returned could be more than 
the limit but it shouldn't be more than the number of intermediate parents.  In 
the example above with limit 2, we'd get docB and docC with child docC.1  WDYT 
[~moshebla]?

> Support Deeply Nested Docs In Child Documents Transformer
> ---------------------------------------------------------
>
>                 Key: SOLR-12519
>                 URL: https://issues.apache.org/jira/browse/SOLR-12519
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: mosh
>            Priority: Major
>         Attachments: SOLR-12519-fix-solrj-tests.patch, 
> SOLR-12519-no-commit.patch, SOLR-12519.patch
>
>          Time Spent: 24h 40m
>  Remaining Estimate: 0h
>
> As discussed in SOLR-12298, to make use of the meta-data fields in 
> SOLR-12441, there needs to be a smarter child document transformer, which 
> provides the ability to rebuild the original nested documents' structure.
>  In addition, I also propose the transformer will also have the ability to 
> bring only some of the original hierarchy, to prevent unnecessary block join 
> queries. e.g.
> {code}  {"a": "b", "c": [ {"e": "f"}, {"e": "g"} , {"h": "i"} ]} {code}
>  Incase my query is for all the children of "a:b", which contain the key "e" 
> in them, the query will be broken in to two parts:
>  1. The parent query "a:b"
>  2. The child query "e:*".
> If the only children flag is on, the transformer will return the following 
> documents:
>  {code}[ {"e": "f"}, {"e": "g"} ]{code}
> In case the flag was not turned on(perhaps the default state), the whole 
> document hierarchy will be returned, containing only the matching children:
> {code}{"a": "b", "c": [ {"e": "f"}, {"e": "g"} ]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12519) Support Deeply Nested Docs In Child Documents Transformer

Reply via email to