[jira] [Commented] (SOLR-12519) Support Deeply Nested Docs In Child Documents Transformer

David Smiley (JIRA) Fri, 13 Jul 2018 20:21:19 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-12519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16543976#comment-16543976
 ]


David Smiley commented on SOLR-12519:
-------------------------------------

Oh I understand now.  My suggestion to use PathHierarchyTokenizerFactory was 
centered around use-cases of querying for child docs purely by this path (e.g. 
all paths that look like this, etc.).  If the query is find all child docs that 
match some arbitrary query (which is what "childFilter" is), and furthermore 
_their_ ancestors, then PathHierarchyTokenizerFactory may not be so useful in 
that.  Sorry for the wild goose chase; though I suspect we'll revisit the use 
of PathHierarchyTokenizerFactory in the near future.

I think we can do this with DocValues to store the nest path, and with 
modifications to ChildDocTransformer's loop over matching child documents.  
Recognize first how Lucene/Solr actually sequence the arrangement of nested 
child documents.  Any given child document always comes _before_ it's parent 
(and thus recursively so).  Therefore, what can be done is to look at all 
documents _after_ a matching child document to see which of those is an 
ancestor of a matching child document.  Detecting if child doc X has an 
ancestor of doc X + N is a matter of comparing if the path at X + N is a prefix 
of the path at X.  You stop looping forward once you reach the root document -- 
tracked in parentsFilter bits.  If that's not enough information for you to 
implement this, I can post a patch modification to ChildDocTransformer that 
will do this, and maybe you could take it further from there (e.g. restructure 
the ancestors into a nice hierarchy).

> Support Deeply Nested Docs In Child Documents Transformer
> ---------------------------------------------------------
>
>                 Key: SOLR-12519
>                 URL: https://issues.apache.org/jira/browse/SOLR-12519
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: mosh
>            Priority: Major
>         Attachments: SOLR-12519-no-commit.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> As discussed in SOLR-12298, to make use of the meta-data fields in 
> SOLR-12441, there needs to be a smarter child document transformer, which 
> provides the ability to rebuild the original nested documents' structure.
>  In addition, I also propose the transformer will also have the ability to 
> bring only some of the original hierarchy, to prevent unnecessary block join 
> queries. e.g.
> {code}  {"a": "b", "c": [ {"e": "f"}, {"e": "g"} , {"h": "i"} ]} {code}
>  Incase my query is for all the children of "a:b", which contain the key "e" 
> in them, the query will be broken in to two parts:
>  1. The parent query "a:b"
>  2. The child query "e:*".
> If the only children flag is on, the transformer will return the following 
> documents:
>  {code}[ {"e": "f"}, {"e": "g"} ]{code}
> In case the flag was not turned on(perhaps the default state), the whole 
> document hierarchy will be returned, containing only the matching children:
> {code}{"a": "b", "c": [ {"e": "f"}, {"e": "g"} ]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12519) Support Deeply Nested Docs In Child Documents Transformer

Reply via email to