[
https://issues.apache.org/jira/browse/SOLR-6700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202006#comment-14202006
]
Bogdan Marinescu commented on SOLR-6700:
----------------------------------------
I have some follow-up questions:
1. Can you tell me what exactly the optimiser is doing ? Because everything
works well until I call optimise. Afterwards, the ChildDocTransformer returns
the wrong children, including other "parents".
I've seen that it reduces the segment count (default to 1), and that the
"Deleted Docs" are back to 0. Is it really worth it ? Can not using the
optimiser lead to performance problems?
2. Is there any other possibility other than ChildDocTransformer to get the
children documents for a parent ? (expand query?) How exactly does the [child]
transformer work ? Apparently it's not using the _root_ field, because if it
were using it, it would group the parent-children correctly.
Thanks
> ChildDocTransformer doesn't return correct children after updating and
> optimising sol'r index
> ---------------------------------------------------------------------------------------------
>
> Key: SOLR-6700
> URL: https://issues.apache.org/jira/browse/SOLR-6700
> Project: Solr
> Issue Type: Bug
> Reporter: Bogdan Marinescu
> Priority: Blocker
> Fix For: 4.10.3, 5.0
>
>
> I have an index with nested documents.
> {code:title=schema.xml snippet|borderStyle=solid}
> <field name="id" type="string" indexed="true" stored="true" required="true"
> multiValued="false" />
> <field name="entityType" type="int" indexed="true" stored="true"
> required="true"/>
> <field name="pName" type="string" indexed="true" stored="true"/>
> <field name="cAlbum" type="string" indexed="true" stored="true"/>
> <field name="cSong" type="string" indexed="true" stored="true"/>
> <field name="_root_" type="string" indexed="true" stored="true"/>
> <field name="_version_" type="long" indexed="true" stored="true"/>
> {code}
> Afterwards I add the following documents:
> {code}
> <add>
> <doc>
> <field name="id">1</field>
> <field name="pName">Test Artist 1</field>
> <field name="entityType">1</field>
> <doc>
> <field name="id">11</field>
> <field name="cAlbum">Test Album 1</field>
> <field name="cSong">Test Song 1</field>
> <field name="entityType">2</field>
> </doc>
> </doc>
> <doc>
> <field name="id">2</field>
> <field name="pName">Test Artist 2</field>
> <field name="entityType">1</field>
> <doc>
> <field name="id">22</field>
> <field name="cAlbum">Test Album 2</field>
> <field name="cSong">Test Song 2</field>
> <field name="entityType">2</field>
> </doc>
> </doc>
> </add>
> {code}
> After performing the following query
> {quote}
> http://localhost:8983/solr/collection1/select?q=%7B!parent+which%3DentityType%3A1%7D&fl=*%2Cscore%2C%5Bchild+parentFilter%3DentityType%3A1%5D&wt=json&indent=true
> {quote}
> I get a correct answer (child matches parent, check _root_ field)
> {code:title=add docs|borderStyle=solid}
> {
> "responseHeader":{
> "status":0,
> "QTime":1,
> "params":{
> "fl":"*,score,[child parentFilter=entityType:1]",
> "indent":"true",
> "q":"{!parent which=entityType:1}",
> "wt":"json"}},
> "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[
> {
> "id":"1",
> "pName":"Test Artist 1",
> "entityType":1,
> "_version_":1483832661048819712,
> "_root_":"1",
> "score":1.0,
> "_childDocuments_":[
> {
> "id":"11",
> "cAlbum":"Test Album 1",
> "cSong":"Test Song 1",
> "entityType":2,
> "_root_":"1"}]},
> {
> "id":"2",
> "pName":"Test Artist 2",
> "entityType":1,
> "_version_":1483832661050916864,
> "_root_":"2",
> "score":1.0,
> "_childDocuments_":[
> {
> "id":"22",
> "cAlbum":"Test Album 2",
> "cSong":"Test Song 2",
> "entityType":2,
> "_root_":"2"}]}]
> }}
> {code}
> Afterwards I try to update one document:
> {code:title=update doc|borderStyle=solid}
> <add>
> <doc>
> <field name="id">1</field>
> <field name="pName" update="set">INIT</field>
> </doc>
> </add>
> {code}
> After performing the previous query I get the right result (like the previous
> one but with the pName field updated).
> The problem only comes after performing an *optimize*.
> Now, the same query yields the following result:
> {code}
> {
> "responseHeader":{
> "status":0,
> "QTime":1,
> "params":{
> "fl":"*,score,[child parentFilter=entityType:1]",
> "indent":"true",
> "q":"{!parent which=entityType:1}",
> "wt":"json"}},
> "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[
> {
> "id":"2",
> "pName":"Test Artist 2",
> "entityType":1,
> "_version_":1483832661050916864,
> "_root_":"2",
> "score":1.0,
> "_childDocuments_":[
> {
> "id":"11",
> "cAlbum":"Test Album 1",
> "cSong":"Test Song 1",
> "entityType":2,
> "_root_":"1"},
> {
> "id":"22",
> "cAlbum":"Test Album 2",
> "cSong":"Test Song 2",
> "entityType":2,
> "_root_":"2"}]},
> {
> "id":"1",
> "pName":"INIT",
> "entityType":1,
> "_root_":"1",
> "_version_":1483832916867809280,
> "score":1.0}]
> }}
> {code}
> As can be seen, the document with id:2 now contains the child with id:11 that
> belongs to the document with id:1.
> I haven't found any references on the web about this except
> http://blog.griddynamics.com/2013/09/solr-block-join-support.html
> Similar issue: SOLR-6096
> Is this problem known? Is there a workaround for this?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]