[ 
https://issues.apache.org/jira/browse/SOLR-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711783#comment-14711783
 ] 

Hoss Man commented on SOLR-7954:
--------------------------------

bq. Later I indexed 400000 documents on which I could not reproduce it. All the 
shards had around 100000 documents each.
bq. There are 4 shards with no replica on my test environment.

Modassar: as i tried to explain in my earlier comments, the number of shards / 
documents doesn't really affect the issue -- the root problem has to do with 
the number of unique _values_ in a single shard which are added to the 
underlying HyperLogLog data structure and then serialized.  Doing more testing 
where you tweak the routing or doc counts may find _differnet_ bugs, but for 
this specific bug the core problem is reviewing the HLL serialization code 
related to the various precision options (which are set based on the 
"cardinality" local param) and the number of unique (hashed) values in each HLL.

> ArrayIndexOutOfBoundsException from distributed HLL serialization logic when 
> using using stats.field={!cardinality=1.0} in a distributed query
> ----------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-7954
>                 URL: https://issues.apache.org/jira/browse/SOLR-7954
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 5.2.1
>         Environment: SolrCloud 4 node cluster.
> Ubuntu 12.04
> OS Type 64 bit
>            Reporter: Modassar Ather
>            Assignee: Hoss Man
>         Attachments: SOLR-7954.patch, SOLR-7954.patch
>
>
> User reports indicate that using {{stats.field=\{!cardinality=1.0\}foo}} on a 
> field that has extremely high cardinality on a single shard (example: 150K 
> unique values) can lead to "ArrayIndexOutOfBoundsException: 3" on the shard 
> during serialization of the HLL values.
> using "cardinality=0.9" (or lower) doesn't produce the same symptoms, 
> suggesting the problem is specific to large log2m and regwidth values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to