[
https://issues.apache.org/jira/browse/SOLR-7802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986230#comment-14986230
]
Yonik Seeley commented on SOLR-7802:
------------------------------------
{quote}
if you build 2 HLL instances, with different log2m settings, and add the exact
same set of (raw) values to both, then the HLL with the larger log2m will give
you the most accurate results then the HLL with a smaller log2m setting.
{quote}
Is that really true for any given set of raw values, or is it just true on
average?
These are just estimates after all, and it would seem like a very difficult
(and interesting) property to achieve what is seemingly claimed. At first
blush, it seems false.
> TestDistributedStatsComponentCardinality failure
> ------------------------------------------------
>
> Key: SOLR-7802
> URL: https://issues.apache.org/jira/browse/SOLR-7802
> Project: Solr
> Issue Type: Bug
> Affects Versions: 5.3, Trunk
> Reporter: Steve Rowe
> Priority: Minor
> Attachments:
> TestDistributedStatsComponentCardinality.tests-failures.txt
>
>
> Original trunk failure on Linux:
> [http://jenkins.sarowe.net/job/Lucene-Solr-tests-trunk/773/]. Reproduced
> with the repro line on OS X, both with trunk/Java8 and branch_5x/java7:
> {noformat}
> [junit4] 2> NOTE: reproduce with: ant test
> -Dtestcase=TestDistributedStatsComponentCardinality -Dtests.method=test
> -Dtests.seed=87100DE827E75E41 -Dtests.slow=true -Dtests.locale=sr_RS
> -Dtests.timezone=Zulu -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
> {noformat}
> {noformat}
> Stack Trace:
> java.lang.AssertionError: int_i: goodEst=13957, poorEst=13970, real=13980,
> p=q=id%3A%5B88+TO+14067%5D&rows=0&stats=true&stats.field=%7B%21cardinality%3D0.008936367747461982+key%3Dlow_int_i%7Dint_i&stats.field=%7B%21cardinality%3D0.008936367747461982+key%3Dlow_int_i_prehashed_l+hllPreHashed%3Dtrue%7Dint_i_prehashed_l&stats.field=%7B%21cardinality%3D0.508936367747462+key%3Dhigh_int_i%7Dint_i&stats.field=%7B%21cardinality%3D0.508936367747462+key%3Dhigh_int_i_prehashed_l+hllPreHashed%3Dtrue%7Dint_i_prehashed_l&stats.field=%7B%21cardinality%3D0.008936367747461982+key%3Dlow_long_l%7Dlong_l&stats.field=%7B%21cardinality%3D0.008936367747461982+key%3Dlow_long_l_prehashed_l+hllPreHashed%3Dtrue%7Dlong_l_prehashed_l&stats.field=%7B%21cardinality%3D0.508936367747462+key%3Dhigh_long_l%7Dlong_l&stats.field=%7B%21cardinality%3D0.508936367747462+key%3Dhigh_long_l_prehashed_l+hllPreHashed%3Dtrue%7Dlong_l_prehashed_l&stats.field=%7B%21cardinality%3D0.008936367747461982+key%3Dlow_string_s%7Dstring_s&stats.field=%7B%21cardinality%3D0.008936367747461982+key%3Dlow_string_s_prehashed_l+hllPreHashed%3Dtrue%7Dstring_s_prehashed_l&stats.field=%7B%21cardinality%3D0.508936367747462+key%3Dhigh_string_s%7Dstring_s&stats.field=%7B%21cardinality%3D0.508936367747462+key%3Dhigh_string_s_prehashed_l+hllPreHashed%3Dtrue%7Dstring_s_prehashed_l
> at
> __randomizedtesting.SeedInfo.seed([87100DE827E75E41:F443232891B33B9]:0)
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.assertTrue(Assert.java:43)
> at
> org.apache.solr.handler.component.TestDistributedStatsComponentCardinality.test(TestDistributedStatsComponentCardinality.java:216)
> [...]
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]