[
https://issues.apache.org/jira/browse/SOLR-12298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459631#comment-16459631
]
mosh edited comment on SOLR-12298 at 5/3/18 8:02 AM:
-----------------------------------------------------
Approach: I see [~janhoy]'s
[proposal|http://lucene.472066.n3.nabble.com/nesting-Any-way-to-return-the-whole-hierarchical-structure-when-doing-Block-Join-queries-td4265933.html#a4380320]
as a starting point for this issue, as it addresses most of the problems, as
well as [this|https://www.youtube.com/watch?v=qV0fIg-LGBE] talk on Solr
Revolution 2016: "Working with Deeply Nested Documents in Apache Solr", as the
starting points to this issue.
Firstly, the way a nested document is indexed has to be changed.
I propose we add the following fields:
# __parent__
# __level__
# __path__
__parent__: This field wild will store the document's parent docId, to be used
for building the whole hierarchy, using a new document transformer, as
suggested by Jan on the mailing list.
__level___: This field will store the level of the specified field in the
document, using an int value. This field can be used for the parentFilter,
eliminating the need to provide a parentFilter, which will be set by default as
"__level__:queriedFieldLevel".
__path__: This field will contain the full path, separated by a specific
reserved char e.g., '.'
for example: "first.second.third".
This will enable users to search for a specific path, or provide a regular
expression to search for fields sharing the same name in different levels of
the document, filtering using the _level_ key if needed.
To make this happen at index time, changes have to be made to the JSON loader,
which will add the above fields, as well as the __root__ field, which holds the
documents top most level docId. This will only happen when a specified
parameter is added to the update request, e.g. "nested=true".
The new child doc transformer will be able to either reassemble the whole
document structure, or do so from a specific level, if specified.
Full hierarchy reconstruction can be done relatively cheaply, using the
__root__ _field to get to the highest level document, and querying the block
for its children, ordering the query by the _level__ field.
was (Author: moshebla):
Approach: I see [~janhoy]'s
[proposal|http://lucene.472066.n3.nabble.com/nesting-Any-way-to-return-the-whole-hierarchical-structure-when-doing-Block-Join-queries-td4265933.html#a4380320]
as a starting point for this issue, as it addresses most of the problems, as
well as [this|https://www.youtube.com/watch?v=qV0fIg-LGBE] talk on Solr
Revolution 2016: "Working with Deeply Nested Documents in Apache Solr", as the
starting points to this issue.
Firstly, the way a nested document is indexed has to be changed.
I propose we add the following fields:
# __parent__
# __level__
# __path__
__parent__: This field wild will store the document's parent docId, to be used
for building the whole hierarchy, using a new document transformer, as
suggested by Jan on the mailing list.
__level__: This field will store the level of the specified field in the
document, using an int value. This field can be used for the parentFilter,
eliminating the need to provide a parentFilter, which will be set by default as
"__level__:queriedFieldLevel".
__path__: This field will contain the full path, separated by a specific
reserved char e.g., '.'
for example: "first.second.third".
This will enable users to search for a specific path, or provide a regular
expression to search for fields sharing the same name in different levels of
the document, filtering using the _level_ key if needed.
To make this happen at index time, changes have to be made to the JSON loader,
which will add the above fields, as well as the __root__ field, which holds the
documents top most level docId. This will only happen when a specified
parameter is added to the update request, e.g. "nested=true".
The new child doc transformer will be able to either reassemble the whole
document structure, or do so from a specific level, if specified.
Full hierarchy reconstruction can be done relatively cheaply, using the
__root__ _field to get to the highest level document, and querying the block
for its children, ordering the query by the _level__ field.
> Index Full nested document Hierarchy For Queries (umbrella issue)
> -----------------------------------------------------------------
>
> Key: SOLR-12298
> URL: https://issues.apache.org/jira/browse/SOLR-12298
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: mosh
> Priority: Major
>
> Solr ought to have the ability to index deeply nested objects, while storing
> the original document hierarchy.
> Currently the client has to index the child document's full path and level to
> manually reconstruct the original document structure, since the children are
> flattened and returned in the reserved "_childDocuments_" key.
> Ideally you could index a nested document, having Solr transparently add the
> required fields while providing a document transformer to rebuild the
> original document's hierarchy.
>
> This issue is an umbrella issue for the particular tasks that will make it
> all happen – either subtasks or issue linking.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]