[jira] [Comment Edited] (SOLR-12298) Index Full nested document Hierarchy For Queries (umbrella issue)

mosh (JIRA) Tue, 01 May 2018 07:22:42 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-12298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459631#comment-16459631
 ]


mosh edited comment on SOLR-12298 at 5/1/18 2:21 PM:
-----------------------------------------------------

Approach: I see [~janhoy]'s 
[proposal|http://lucene.472066.n3.nabble.com/nesting-Any-way-to-return-the-whole-hierarchical-structure-when-doing-Block-Join-queries-td4265933.html#a4380320]
 as a starting point for this issue, as it addresses most of the problems, as 
well as [this|https://www.youtube.com/watch?v=qV0fIg-LGBE] talk on Solr 
Revolution 2016: "Working with Deeply Nested Documents in Apache Solr", as the 
starting points to this issue.

Firstly, the way a nested document is indexed has to be changed.
 I propose we add the following fields:
 # __parent__
 # __level__
 # __path__

__parent__: This field wild will store the document's parent docId, to be used 
for building the whole hierarchy, using a new document transformer, as 
suggested by Jan on the mailing list.

__level__: This field will store the level of the specified field in the 
document, using an int value. This field can be used for the parentFilter, 
eliminating the need to provide a parentFilter, which will be set by default as 
"__level__:queriedFieldLevel".

__path__: This field will contain the full path, separated by a specific 
reserved char e.g., '.'
 for example: "first.second.third".
 This will enable users to search for a specific path, or provide a regular 
expression to search for fields sharing the same name in different levels of 
the document, filtering using the _level_ key if needed.

To make this happen at index time, changes have to be made to the JSON loader, 
which will add the above fields, as well as the __root__ field, which holds the 
documents top most level docId. This will only happen when a specified 
parameter is added to the update request, e.g. "nested=true".

The new child doc transformer will be able to either reassemble the whole 
document structure, or do so from a specific level, if specified.
 Full hierarchy reconstruction can be done relatively cheaply, using the 
__root__ _field to get to the highest level document, and querying the block 
for its children, ordering the query by the _level__ field.


was (Author: moshebla):
Approach: I see [~janhoy]'s 
[proposal|http://lucene.472066.n3.nabble.com/nesting-Any-way-to-return-the-whole-hierarchical-structure-when-doing-Block-Join-queries-td4265933.html#a4380320]
 as a starting point for this issue, as it addresses most of the problems, as 
well as [this|https://www.youtube.com/watch?v=qV0fIg-LGBE] talk on Solr 
Revolution 2016: "Working with Deeply Nested Documents in Apache Solr", as the 
starting points to this issue.

Firstly, the way a nested document is indexed has to be changed.
 I propose we add the following fields:
 # __parent__
 # __level__
 # __path__

__parent__: This field wild will store the document's parent docId, to be used 
for building the whole hierarchy, using a new document transformer, as 
suggested by Jan on the mailing list.

__level__: This field will store the level of the specified field in the 
document, using an int value. This field can be used for the parentFilter, 
eliminating the need to provide a parentFilter, which will be set by default as 
"__level__:queriedFieldLevel".

__path__: This field will contain the full path, separated by a specific 
reserved char e.g., '.'
 for example: "first.second.third".
 This will enable users to search for a specific path, or provide a regular 
expression to search for fields sharing the same name in different levels of 
the document, filtering using the _level_ key if needed.

To make this happen at index time, changes have to be made to the JSON loader, 
which will add the above fields, as well as the __root__ field, which holds the 
documents top most level docId. This will only happen when a specified 
parameter is added to the update request, e.g. "nested=true".

The new child doc transformer will be able to either reassemble the whole 
document structure, or do so from a specific level, if specified.
 Full hierarchy reconstruction can be done relatively cheaply, using the 
__root__ field to get to the highest level document, and querying the block for 
its children, ordering the query by the __level__ field.

> Index Full nested document Hierarchy For Queries (umbrella issue)
> -----------------------------------------------------------------
>
>                 Key: SOLR-12298
>                 URL: https://issues.apache.org/jira/browse/SOLR-12298
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: mosh
>            Priority: Major
>
> Solr ought to have the ability to index deeply nested objects, while storing 
> the original document hierarchy.
> Currently the client has to index the child document's full path and level to 
> manually reconstruct the original document structure, since the children are 
> flattened and returned in the reserved "_childDocuments_" key.
> Ideally you could index a nested document, having Solr transparently add the 
> required fields while providing a document transformer to rebuild the 
> original document's hierarchy.
>  
> This issue is an umbrella issue for the particular tasks that will make it 
> all happen – either subtasks or issue linking.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-12298) Index Full nested document Hierarchy For Queries (umbrella issue)

Reply via email to