janhoy opened a new pull request, #3924:
URL: https://github.com/apache/solr/pull/3924

   …that failed for some seeds, e.g. 
   
   ```
   gradle test --tests DistributedFacetSimpleRefinementLongTailTest.test 
-Dtests.seed=A747120FD7BE8EB6 -Dtests.locale=ne 
-Dtests.timezone=Africa/El_Aaiun -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8 
   ```
   
   https://issues.apache.org/jira/browse/SOLR-18012
   
   Done in collaboration with Claude Code. Explanation:
   
   ## Root Cause Analysis and Solution
   
   What's Causing the sometimes big variance?
   
   The >90% pass rate with tight tolerance but occasional failures for some 
test seeds is due to multiple compounding factors:
   
   1. T-Digest Merging Error: The percentile implementation uses AVLTreeDigest 
with compression=100. When results from 3 shards are merged 
(PercentileAgg.java:488 digest.add(subDigest)), approximation errors compound. 
This is inherent to the algorithm.
   2. 90th Percentile: The test requests the 90th percentile (line 68: 
STAT_FIELD + ",90"). Extreme percentiles (near 0 or 100) have more error than 
median percentiles. The t-digest algorithm is optimized for accuracy around the 
median.
   3. Random Field Selection: Line 65 randomly chooses between stat_i 
(single-valued) and stat_is (multivalued). Different code paths have slightly 
different merging characteristics.
   4. Random Codec/Structure: Test parameters like maxPointsInLeafNode=1867 
affect the BKD tree structure, which influences iteration order and potentially 
floating-point accumulation order.
   5. Data Distribution: With 300 docs for aaa0 distributed across 3 shards 
with formulas like j*13-i, j*3+i, i*7+j, the exact centroid placements in the 
t-digest vary based on processing order.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to