[
https://issues.apache.org/jira/browse/SOLR-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13940522#comment-13940522
]
J.B. Langston edited comment on SOLR-5878 at 3/19/14 2:47 PM:
--------------------------------------------------------------
Sorry for not following protocol. Do you want me to move to the list now or
continue here since it's already open?
I may have misstated the problem here. The duplicates aren't the problem;
rather that it ignores the rows parameter when using sharding and
group.format=simple at the same time. You'll notice that there is a rows=5
param in the url, but in the output there are 16 documents returned. This
prevents the use of rows and start params to page through the data.
You're right about the cont_stub field not being the unique key. id is the
unique key and indeed there are multiple documents with the same value for
cont_stub and different values for the unique key. I was filing this on behalf
of a customer and as I was reproducing it, I noticed the duplicates and got
distracted by those. Sorry for the confusion; I can update the description to
reflect the true problem if you like, or I can ask on the mailing list before
continuing here.
was (Author: [email protected]):
Sorry for not following protocol. Do you want me to move to the list now or
continue here since it's already open?
I may have misstated the problem here. The duplicates aren't the problem;
rather that it ignores the rows parameter when using sharding and
group.format=simple at the same time. You'll notice that there is a rows=5
param in the url, but in the output there are 16 documents returned. This
prevents the use of rows and start params to page through the data.
You're right about the cont_stub field not being the unique key. id is the
unique key and indeed there are multiple documents with the same value for
cont_stub and different values for the unique key. I was filing this on behalf
of a customer and as I was reproducing it, I noticed the duplicates and got
distracted by those. Sorry for the confusion; I can update the description to
reflect the true problem if you like, or I ask on the mailing list before
continuing here.
> Solr returns duplicates when using distributed search with group.format=simple
> ------------------------------------------------------------------------------
>
> Key: SOLR-5878
> URL: https://issues.apache.org/jira/browse/SOLR-5878
> Project: Solr
> Issue Type: Bug
> Affects Versions: 4.6
> Reporter: J.B. Langston
>
> Solr returns duplicate documents when group.format=simple is supplied on a
> distributed search. This does not happen on the standard group format or when
> not using distributed search.
> For example:
> {code}
> http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1&q=*%3A*&fq=evt_stub%3A(452deed8-c3a2-49a8-878d-8356da315e6a)&start=0&rows=5&fl=cont_stub&wt=xml&indent=true&group=true&group.field=cont_stub&group.format=simple&group.limit=1000
> {code}
> Returns:
> {code}
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">253</int>
> </lst>
> <lst name="grouped">
> <lst name="cont_stub">
> <int name="matches">56</int>
> <result name="doclist" numFound="56" start="0" maxScore="1.0">
> <doc>
> <str name="cont_stub">e60eb0f9-bce7-4da9-819c-b356dfc1c4f7</str></doc>
> <doc>
> <str name="cont_stub">e60eb0f9-bce7-4da9-819c-b356dfc1c4f7</str></doc>
> <doc>
> <str name="cont_stub">e60eb0f9-bce7-4da9-819c-b356dfc1c4f7</str></doc>
> <doc>
> <str name="cont_stub">faf0a7ea-4252-4eda-990a-4fcc6b5e63e3</str></doc>
> <doc>
> <str name="cont_stub">faf0a7ea-4252-4eda-990a-4fcc6b5e63e3</str></doc>
> <doc>
> <str name="cont_stub">faf0a7ea-4252-4eda-990a-4fcc6b5e63e3</str></doc>
> <doc>
> <str name="cont_stub">dd94ec0b-f171-441d-8fb8-af6a22ebf168</str></doc>
> <doc>
> <str name="cont_stub">dd94ec0b-f171-441d-8fb8-af6a22ebf168</str></doc>
> <doc>
> <str name="cont_stub">dd94ec0b-f171-441d-8fb8-af6a22ebf168</str></doc>
> <doc>
> <str name="cont_stub">feede138-2fe4-4742-ac63-e7cecfd86c81</str></doc>
> <doc>
> <str name="cont_stub">feede138-2fe4-4742-ac63-e7cecfd86c81</str></doc>
> <doc>
> <str name="cont_stub">feede138-2fe4-4742-ac63-e7cecfd86c81</str></doc>
> <doc>
> <str name="cont_stub">86944a90-033d-4676-9ac3-b59744fc52a5</str></doc>
> <doc>
> <str name="cont_stub">86944a90-033d-4676-9ac3-b59744fc52a5</str></doc>
> <doc>
> <str name="cont_stub">86944a90-033d-4676-9ac3-b59744fc52a5</str></doc>
> <doc>
> <str name="cont_stub">86944a90-033d-4676-9ac3-b59744fc52a5</str></doc>
> </result>
> </lst>
> </lst>
> </response>
> {code}
> It should only return 5 documents. Removing the distributed search and
> searching on either core will return the requested number of rows. Removing
> group.format=simple will also return the requested number of rows.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]