[ 
https://issues.apache.org/jira/browse/SOLR-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13940522#comment-13940522
 ] 

J.B. Langston edited comment on SOLR-5878 at 3/19/14 2:47 PM:
--------------------------------------------------------------

Sorry for not following protocol. Do you want me to move to the list now or 
continue here since it's already open?

I may have misstated the problem here. The duplicates aren't the problem; 
rather that it ignores the rows parameter when using sharding and 
group.format=simple at the same time.  You'll notice that there is a rows=5 
param in the url, but in the output there are 16 documents returned.  This 
prevents the use of rows and start params to page through the data.

You're right about the cont_stub field not being the unique key. id is the 
unique key and indeed there are multiple documents with the same value for 
cont_stub and different values for the unique key.  I was filing this on behalf 
of a customer and as I was reproducing it, I noticed the duplicates and got 
distracted by those. Sorry for the confusion; I can update the description to 
reflect the true problem if you like, or I can ask on the mailing list before 
continuing here.


was (Author: [email protected]):
Sorry for not following protocol. Do you want me to move to the list now or 
continue here since it's already open?

I may have misstated the problem here. The duplicates aren't the problem; 
rather that it ignores the rows parameter when using sharding and 
group.format=simple at the same time.  You'll notice that there is a rows=5 
param in the url, but in the output there are 16 documents returned.  This 
prevents the use of rows and start params to page through the data.

You're right about the cont_stub field not being the unique key. id is the 
unique key and indeed there are multiple documents with the same value for 
cont_stub and different values for the unique key.  I was filing this on behalf 
of a customer and as I was reproducing it, I noticed the duplicates and got 
distracted by those. Sorry for the confusion; I can update the description to 
reflect the true problem if you like, or I ask on the mailing list before 
continuing here.

> Solr returns duplicates when using distributed search with group.format=simple
> ------------------------------------------------------------------------------
>
>                 Key: SOLR-5878
>                 URL: https://issues.apache.org/jira/browse/SOLR-5878
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.6
>            Reporter: J.B. Langston
>
> Solr returns duplicate documents when group.format=simple is supplied on a 
> distributed search. This does not happen on the standard group format or when 
> not using distributed search. 
> For example:
> {code}
> http://localhost:8983/solr/core0/select?shards=localhost:8983/solr/core0,localhost:8983/solr/core1&q=*%3A*&fq=evt_stub%3A(452deed8-c3a2-49a8-878d-8356da315e6a)&start=0&rows=5&fl=cont_stub&wt=xml&indent=true&group=true&group.field=cont_stub&group.format=simple&group.limit=1000
> {code}
> Returns:
> {code}
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader">
>   <int name="status">0</int>
>   <int name="QTime">253</int>
> </lst>
> <lst name="grouped">
>   <lst name="cont_stub">
>     <int name="matches">56</int>
>     <result name="doclist" numFound="56" start="0" maxScore="1.0">
>       <doc>
>         <str name="cont_stub">e60eb0f9-bce7-4da9-819c-b356dfc1c4f7</str></doc>
>       <doc>
>         <str name="cont_stub">e60eb0f9-bce7-4da9-819c-b356dfc1c4f7</str></doc>
>       <doc>
>         <str name="cont_stub">e60eb0f9-bce7-4da9-819c-b356dfc1c4f7</str></doc>
>       <doc>
>         <str name="cont_stub">faf0a7ea-4252-4eda-990a-4fcc6b5e63e3</str></doc>
>       <doc>
>         <str name="cont_stub">faf0a7ea-4252-4eda-990a-4fcc6b5e63e3</str></doc>
>       <doc>
>         <str name="cont_stub">faf0a7ea-4252-4eda-990a-4fcc6b5e63e3</str></doc>
>       <doc>
>         <str name="cont_stub">dd94ec0b-f171-441d-8fb8-af6a22ebf168</str></doc>
>       <doc>
>         <str name="cont_stub">dd94ec0b-f171-441d-8fb8-af6a22ebf168</str></doc>
>       <doc>
>         <str name="cont_stub">dd94ec0b-f171-441d-8fb8-af6a22ebf168</str></doc>
>       <doc>
>         <str name="cont_stub">feede138-2fe4-4742-ac63-e7cecfd86c81</str></doc>
>       <doc>
>         <str name="cont_stub">feede138-2fe4-4742-ac63-e7cecfd86c81</str></doc>
>       <doc>
>         <str name="cont_stub">feede138-2fe4-4742-ac63-e7cecfd86c81</str></doc>
>       <doc>
>         <str name="cont_stub">86944a90-033d-4676-9ac3-b59744fc52a5</str></doc>
>       <doc>
>         <str name="cont_stub">86944a90-033d-4676-9ac3-b59744fc52a5</str></doc>
>       <doc>
>         <str name="cont_stub">86944a90-033d-4676-9ac3-b59744fc52a5</str></doc>
>       <doc>
>         <str name="cont_stub">86944a90-033d-4676-9ac3-b59744fc52a5</str></doc>
>     </result>
>   </lst>
> </lst>
> </response>
> {code}
> It should only return 5 documents.  Removing the distributed search and 
> searching on either core will return the requested number of rows. Removing 
> group.format=simple will also return the requested number of rows.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to