[
https://issues.apache.org/jira/browse/SOLR-6314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088833#comment-14088833
]
Vamsee Yarlagadda commented on SOLR-6314:
-----------------------------------------
Thanks [~erickerickson] for looking into this.
Yes, you are right. It makes perfect sense to return a count for every unique
facet request rather than repeating the facets over and over. It might be the
case that the facet result that's returned in the case of multi-shard (by going
through the aggregating code) is the right thing to do. Perhaps, we may want to
fix the behavior for single shard system and make changes to the unit tests to
reflect the same.
I can't think of any particular reason why the initial implementation of
multithreaded faceting created a test that will check for duplicate facet
counts. It might be a test bug too?
https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test/org/apache/solr/request/TestFaceting.java#L654
Thoughts?
> Multi-threaded facet counts differ when SolrCloud has >1 shard
> --------------------------------------------------------------
>
> Key: SOLR-6314
> URL: https://issues.apache.org/jira/browse/SOLR-6314
> Project: Solr
> Issue Type: Bug
> Components: SearchComponents - other, SolrCloud
> Affects Versions: 5.0
> Reporter: Vamsee Yarlagadda
> Assignee: Erick Erickson
>
> I am trying to work with multi-threaded faceting on SolrCloud and in the
> process i was hit by some issues.
> I am currently running the below upstream test on different SolrCloud
> configurations and i am getting a different result set per configuration.
> https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test/org/apache/solr/request/TestFaceting.java#L654
> Setup:
> - *Indexed 50 docs into SolrCloud.*
> - *If the SolrCloud has only 1 shard, the facet field query has the below
> output (which matches with the expected upstream test output - # facet fields
> ~ 50).*
> {code}
> $ curl
> "http://localhost:8983/solr/collection1/select?facet=true&fl=id&indent=true&q=id%3A*&facet.limit=-1&facet.threads=1000&facet.field=f0_ws&facet.field=f0_ws&facet.field=f0_ws&facet.field=f0_ws&facet.field=f0_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f9_ws&facet.field=f9_ws&facet.field=f9_ws&facet.field=f9_ws&facet.field=f9_ws&rows=1&wt=xml"
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">21</int>
> <lst name="params">
> <str name="facet">true</str>
> <str name="fl">id</str>
> <str name="indent">true</str>
> <str name="q">id:*</str>
> <str name="facet.limit">-1</str>
> <str name="facet.threads">1000</str>
> <arr name="facet.field">
> <str>f0_ws</str>
> <str>f0_ws</str>
> <str>f0_ws</str>
> <str>f0_ws</str>
> <str>f0_ws</str>
> <str>f1_ws</str>
> <str>f1_ws</str>
> <str>f1_ws</str>
> <str>f1_ws</str>
> <str>f1_ws</str>
> <str>f2_ws</str>
> <str>f2_ws</str>
> <str>f2_ws</str>
> <str>f2_ws</str>
> <str>f2_ws</str>
> <str>f3_ws</str>
> <str>f3_ws</str>
> <str>f3_ws</str>
> <str>f3_ws</str>
> <str>f3_ws</str>
> <str>f4_ws</str>
> <str>f4_ws</str>
> <str>f4_ws</str>
> <str>f4_ws</str>
> <str>f4_ws</str>
> <str>f5_ws</str>
> <str>f5_ws</str>
> <str>f5_ws</str>
> <str>f5_ws</str>
> <str>f5_ws</str>
> <str>f6_ws</str>
> <str>f6_ws</str>
> <str>f6_ws</str>
> <str>f6_ws</str>
> <str>f6_ws</str>
> <str>f7_ws</str>
> <str>f7_ws</str>
> <str>f7_ws</str>
> <str>f7_ws</str>
> <str>f7_ws</str>
> <str>f8_ws</str>
> <str>f8_ws</str>
> <str>f8_ws</str>
> <str>f8_ws</str>
> <str>f8_ws</str>
> <str>f9_ws</str>
> <str>f9_ws</str>
> <str>f9_ws</str>
> <str>f9_ws</str>
> <str>f9_ws</str>
> </arr>
> <str name="wt">xml</str>
> <str name="rows">1</str>
> </lst>
> </lst>
> <result name="response" numFound="50" start="0">
> <doc>
> <float name="id">0.0</float></doc>
> </result>
> <lst name="facet_counts">
> <lst name="facet_queries"/>
> <lst name="facet_fields">
> <lst name="f0_ws">
> <int name="zero_1">25</int>
> <int name="zero_2">25</int>
> </lst>
> <lst name="f0_ws">
> <int name="zero_1">25</int>
> <int name="zero_2">25</int>
> </lst>
> <lst name="f0_ws">
> <int name="zero_1">25</int>
> <int name="zero_2">25</int>
> </lst>
> <lst name="f0_ws">
> <int name="zero_1">25</int>
> <int name="zero_2">25</int>
> </lst>
> <lst name="f0_ws">
> <int name="zero_1">25</int>
> <int name="zero_2">25</int>
> </lst>
> <lst name="f1_ws">
> <int name="one_1">33</int>
> <int name="one_3">17</int>
> </lst>
> <lst name="f1_ws">
> <int name="one_1">33</int>
> <int name="one_3">17</int>
> </lst>
> <lst name="f1_ws">
> <int name="one_1">33</int>
> <int name="one_3">17</int>
> </lst>
> <lst name="f1_ws">
> <int name="one_1">33</int>
> <int name="one_3">17</int>
> </lst>
> <lst name="f1_ws">
> <int name="one_1">33</int>
> <int name="one_3">17</int>
> </lst>
> <lst name="f2_ws">
> <int name="two_1">37</int>
> <int name="two_4">13</int>
> </lst>
> <lst name="f2_ws">
> <int name="two_1">37</int>
> <int name="two_4">13</int>
> </lst>
> <lst name="f2_ws">
> <int name="two_1">37</int>
> <int name="two_4">13</int>
> </lst>
> <lst name="f2_ws">
> <int name="two_1">37</int>
> <int name="two_4">13</int>
> </lst>
> <lst name="f2_ws">
> <int name="two_1">37</int>
> <int name="two_4">13</int>
> </lst>
> <lst name="f3_ws">
> <int name="three_1">40</int>
> <int name="three_5">10</int>
> </lst>
> <lst name="f3_ws">
> <int name="three_1">40</int>
> <int name="three_5">10</int>
> </lst>
> <lst name="f3_ws">
> <int name="three_1">40</int>
> <int name="three_5">10</int>
> </lst>
> <lst name="f3_ws">
> <int name="three_1">40</int>
> <int name="three_5">10</int>
> </lst>
> <lst name="f3_ws">
> <int name="three_1">40</int>
> <int name="three_5">10</int>
> </lst>
> <lst name="f4_ws">
> <int name="four_1">41</int>
> <int name="four_6">9</int>
> </lst>
> <lst name="f4_ws">
> <int name="four_1">41</int>
> <int name="four_6">9</int>
> </lst>
> <lst name="f4_ws">
> <int name="four_1">41</int>
> <int name="four_6">9</int>
> </lst>
> <lst name="f4_ws">
> <int name="four_1">41</int>
> <int name="four_6">9</int>
> </lst>
> <lst name="f4_ws">
> <int name="four_1">41</int>
> <int name="four_6">9</int>
> </lst>
> <lst name="f5_ws">
> <int name="five_1">42</int>
> <int name="five_7">8</int>
> </lst>
> <lst name="f5_ws">
> <int name="five_1">42</int>
> <int name="five_7">8</int>
> </lst>
> <lst name="f5_ws">
> <int name="five_1">42</int>
> <int name="five_7">8</int>
> </lst>
> <lst name="f5_ws">
> <int name="five_1">42</int>
> <int name="five_7">8</int>
> </lst>
> <lst name="f5_ws">
> <int name="five_1">42</int>
> <int name="five_7">8</int>
> </lst>
> <lst name="f6_ws">
> <int name="six_1">43</int>
> <int name="six_8">7</int>
> </lst>
> <lst name="f6_ws">
> <int name="six_1">43</int>
> <int name="six_8">7</int>
> </lst>
> <lst name="f6_ws">
> <int name="six_1">43</int>
> <int name="six_8">7</int>
> </lst>
> <lst name="f6_ws">
> <int name="six_1">43</int>
> <int name="six_8">7</int>
> </lst>
> <lst name="f6_ws">
> <int name="six_1">43</int>
> <int name="six_8">7</int>
> </lst>
> <lst name="f7_ws">
> <int name="seven_1">44</int>
> <int name="seven_9">6</int>
> </lst>
> <lst name="f7_ws">
> <int name="seven_1">44</int>
> <int name="seven_9">6</int>
> </lst>
> <lst name="f7_ws">
> <int name="seven_1">44</int>
> <int name="seven_9">6</int>
> </lst>
> <lst name="f7_ws">
> <int name="seven_1">44</int>
> <int name="seven_9">6</int>
> </lst>
> <lst name="f7_ws">
> <int name="seven_1">44</int>
> <int name="seven_9">6</int>
> </lst>
> <lst name="f8_ws">
> <int name="eight_1">45</int>
> <int name="eight_10">5</int>
> </lst>
> <lst name="f8_ws">
> <int name="eight_1">45</int>
> <int name="eight_10">5</int>
> </lst>
> <lst name="f8_ws">
> <int name="eight_1">45</int>
> <int name="eight_10">5</int>
> </lst>
> <lst name="f8_ws">
> <int name="eight_1">45</int>
> <int name="eight_10">5</int>
> </lst>
> <lst name="f8_ws">
> <int name="eight_1">45</int>
> <int name="eight_10">5</int>
> </lst>
> <lst name="f9_ws">
> <int name="nine_1">45</int>
> <int name="nine_11">5</int>
> </lst>
> <lst name="f9_ws">
> <int name="nine_1">45</int>
> <int name="nine_11">5</int>
> </lst>
> <lst name="f9_ws">
> <int name="nine_1">45</int>
> <int name="nine_11">5</int>
> </lst>
> <lst name="f9_ws">
> <int name="nine_1">45</int>
> <int name="nine_11">5</int>
> </lst>
> <lst name="f9_ws">
> <int name="nine_1">45</int>
> <int name="nine_11">5</int>
> </lst>
> </lst>
> <lst name="facet_dates"/>
> <lst name="facet_ranges"/>
> </lst>
> </response>
> {code}
> - *Now, if a create a new collection with 2 shards (>1 shard SolrCloud), the
> same above query results in a different output. (# facet fields ~ 10 ;
> Expected 50)*
> {code}
> $ curl
> "http://localhost:8983/solr/collection1/select?facet=true&fl=id&indent=true&q=id%3A*&facet.limit=-1&facet.threads=1000&facet.field=f0_ws&facet.field=f0_ws&facet.field=f0_ws&facet.field=f0_ws&facet.field=f0_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f1_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f2_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f3_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f4_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f5_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f6_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f7_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f8_ws&facet.field=f9_ws&facet.field=f9_ws&facet.field=f9_ws&facet.field=f9_ws&facet.field=f9_ws&rows=1&wt=xml"
>
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">31</int>
> <lst name="params">
> <str name="facet">true</str>
> <str name="fl">id</str>
> <str name="indent">true</str>
> <str name="q">id:*</str>
> <str name="facet.limit">-1</str>
> <str name="facet.threads">1000</str>
> <arr name="facet.field">
> <str>f0_ws</str>
> <str>f0_ws</str>
> <str>f0_ws</str>
> <str>f0_ws</str>
> <str>f0_ws</str>
> <str>f1_ws</str>
> <str>f1_ws</str>
> <str>f1_ws</str>
> <str>f1_ws</str>
> <str>f1_ws</str>
> <str>f2_ws</str>
> <str>f2_ws</str>
> <str>f2_ws</str>
> <str>f2_ws</str>
> <str>f2_ws</str>
> <str>f3_ws</str>
> <str>f3_ws</str>
> <str>f3_ws</str>
> <str>f3_ws</str>
> <str>f3_ws</str>
> <str>f4_ws</str>
> <str>f4_ws</str>
> <str>f4_ws</str>
> <str>f4_ws</str>
> <str>f4_ws</str>
> <str>f5_ws</str>
> <str>f5_ws</str>
> <str>f5_ws</str>
> <str>f5_ws</str>
> <str>f5_ws</str>
> <str>f6_ws</str>
> <str>f6_ws</str>
> <str>f6_ws</str>
> <str>f6_ws</str>
> <str>f6_ws</str>
> <str>f7_ws</str>
> <str>f7_ws</str>
> <str>f7_ws</str>
> <str>f7_ws</str>
> <str>f7_ws</str>
> <str>f8_ws</str>
> <str>f8_ws</str>
> <str>f8_ws</str>
> <str>f8_ws</str>
> <str>f8_ws</str>
> <str>f9_ws</str>
> <str>f9_ws</str>
> <str>f9_ws</str>
> <str>f9_ws</str>
> <str>f9_ws</str>
> </arr>
> <str name="wt">xml</str>
> <str name="rows">1</str>
> </lst>
> </lst>
> <result name="response" numFound="50" start="0" maxScore="1.0">
> <doc>
> <float name="id">2.0</float></doc>
> </result>
> <lst name="facet_counts">
> <lst name="facet_queries"/>
> <lst name="facet_fields">
> <lst name="f0_ws">
> <int name="zero_1">25</int>
> <int name="zero_2">25</int>
> </lst>
> <lst name="f1_ws">
> <int name="one_1">33</int>
> <int name="one_3">17</int>
> </lst>
> <lst name="f2_ws">
> <int name="two_1">37</int>
> <int name="two_4">13</int>
> </lst>
> <lst name="f3_ws">
> <int name="three_1">40</int>
> <int name="three_5">10</int>
> </lst>
> <lst name="f4_ws">
> <int name="four_1">41</int>
> <int name="four_6">9</int>
> </lst>
> <lst name="f5_ws">
> <int name="five_1">42</int>
> <int name="five_7">8</int>
> </lst>
> <lst name="f6_ws">
> <int name="six_1">43</int>
> <int name="six_8">7</int>
> </lst>
> <lst name="f7_ws">
> <int name="seven_1">44</int>
> <int name="seven_9">6</int>
> </lst>
> <lst name="f8_ws">
> <int name="eight_1">45</int>
> <int name="eight_10">5</int>
> </lst>
> <lst name="f9_ws">
> <int name="nine_1">45</int>
> <int name="nine_11">5</int>
> </lst>
> </lst>
> <lst name="facet_dates"/>
> <lst name="facet_ranges"/>
> </lst>
> </response>
> {code}
> This behavior is quite strange as it is being dependent on the number of
> shards in SolrCloud. It would be great if someone can shed some light on this?
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]