Bogdan created SOLR-6700:
----------------------------

             Summary: ChildDocTransformer doesn't return correct children after 
updating and optimising sol'r index
                 Key: SOLR-6700
                 URL: https://issues.apache.org/jira/browse/SOLR-6700
             Project: Solr
          Issue Type: Bug
            Reporter: Bogdan
            Priority: Blocker
             Fix For: 4.10.3, 5.0


I have an index with nested documents. 
{code:title=schema.xml snippet|borderStyle=solid}
 <field name="id" type="string" indexed="true" stored="true" required="true" 
multiValued="false" />
<field name="entityType" type="int" indexed="true" stored="true" 
required="true"/>
<field name="pName" type="string" indexed="true" stored="true"/>
<field name="cAlbum" type="string" indexed="true" stored="true"/>
<field name="cSong" type="string" indexed="true" stored="true"/>
<field name="_root_" type="string" indexed="true" stored="true"/>
<field name="_version_" type="long" indexed="true" stored="true"/>
{code}

Afterwards I add the following documents:
{code}
<add>
  <doc>
    <field name="id">1</field>
    <field name="pName">Test Artist 1</field>
    <field name="entityType">1</field>
    <doc>
        <field name="id">11</field>
        <field name="cAlbum">Test Album 1</field>
            <field name="cSong">Test Song 1</field>
        <field name="entityType">2</field>
    </doc>
  </doc>
  <doc>
    <field name="id">2</field>
    <field name="pName">Test Artist 2</field>
    <field name="entityType">1</field>
    <doc>
        <field name="id">22</field>
        <field name="cAlbum">Test Album 2</field>
            <field name="cSong">Test Song 2</field>
        <field name="entityType">2</field>
    </doc>
  </doc>
</add>
{code}

After performing the following query 
{quote}
http://localhost:8983/solr/collection1/select?q=%7B!parent+which%3DentityType%3A1%7D&fl=*%2Cscore%2C%5Bchild+parentFilter%3DentityType%3A1%5D&wt=json&indent=true
{quote}
I get a correct answer (child matches parent, check _root_ field)
{code:title=add docs|borderStyle=solid}
{
  "responseHeader":{
    "status":0,
    "QTime":1,
    "params":{
      "fl":"*,score,[child parentFilter=entityType:1]",
      "indent":"true",
      "q":"{!parent which=entityType:1}",
      "wt":"json"}},
  "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[
      {
        "id":"1",
        "pName":"Test Artist 1",
        "entityType":1,
        "_version_":1483832661048819712,
        "_root_":"1",
        "score":1.0,
        "_childDocuments_":[
        {
          "id":"11",
          "cAlbum":"Test Album 1",
          "cSong":"Test Song 1",
          "entityType":2,
          "_root_":"1"}]},
      {
        "id":"2",
        "pName":"Test Artist 2",
        "entityType":1,
        "_version_":1483832661050916864,
        "_root_":"2",
        "score":1.0,
        "_childDocuments_":[
        {
          "id":"22",
          "cAlbum":"Test Album 2",
          "cSong":"Test Song 2",
          "entityType":2,
          "_root_":"2"}]}]
  }}
{code}

Afterwards I try to update one document:
{code:title=update doc|borderStyle=solid}
<add>
<doc>
<field name="id">1</field>
<field name="pName" update="set">INIT</field>
</doc>
</add>
{code}

After performing the previous query I get the right result (like the previous 
one but with the pName field updated).

The problem only comes after performing an optimize. 
Now, the same query yields the following result:
{code}
{
  "responseHeader":{
    "status":0,
    "QTime":1,
    "params":{
      "fl":"*,score,[child parentFilter=entityType:1]",
      "indent":"true",
      "q":"{!parent which=entityType:1}",
      "wt":"json"}},
  "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[
      {
        "id":"2",
        "pName":"Test Artist 2",
        "entityType":1,
        "_version_":1483832661050916864,
        "_root_":"2",
        "score":1.0,
        "_childDocuments_":[
        {
          "id":"11",
          "cAlbum":"Test Album 1",
          "cSong":"Test Song 1",
          "entityType":2,
          "_root_":"1"},
        {
          "id":"22",
          "cAlbum":"Test Album 2",
          "cSong":"Test Song 2",
          "entityType":2,
          "_root_":"2"}]},
      {
        "id":"1",
        "pName":"INIT",
        "entityType":1,
        "_root_":"1",
        "_version_":1483832916867809280,
        "score":1.0}]
  }}
{code}

As can be seen, the document with id:2 now contains the child with id:11 that 
belongs to the document with id:1. 

I haven't found any references on the web about this except 
http://blog.griddynamics.com/2013/09/solr-block-join-support.html
{quote}
Let me show you one unlucky example. Let’s remove parent and left children in 
the index.
<update><delete><query>id:10</query></delete><commit/></update>  
At first, It seems like everything still works. Children 11 and 12 are left in 
the index, but ToParentBlockJoinQuery somehow detects it and q={!parent 
which='type_s:parent'}+COLOR_s:Red +SIZE_s:XL  correctly returns parent 30. 
However after <optimize/> is executed, deleted parent document is purged from 
the index and all of the sudden children 11 and 12 start to be considered as if 
they belong to parent 20! The same query q={!parent 
which='type_s:parent'}+COLOR_s:Red +SIZE_s:XL now returns 20 and 30 which is 
wrong! I’m afraid there are few other similar cases of wrong behavior. As a 
reliable workaround I suggest to send explicit deletes by query with implicit 
field _root_. I hope this caveat will be fixed in future.
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to