cryptoe commented on PR #14334:
URL: https://github.com/apache/druid/pull/14334#issuecomment-1774522282

   The datasketch version 4.xx  spitting out weird splits for an 
`ItemSketch<Long>`  when the function `sketch.getPartitionBoundaries(numSplits)`
   When an item sketch with monotonically increasing number from 0 to 10 
billion is chosen the distribution for 40 splits 
   ```
   0
   250025928
   500108450
   750197955
   1000017365
   1250101575
   1500191183
   1750009306
   2000098020
   2147424894
   2147424894
   2147424894
   2147424894
   2147424894
   2147424894
   2147424894
   2147424894
   2147424894
   2147424894
   2147424894
   2147424894
   2147424894
   2147424894
   2147424894
   2147424894
   2147424894
   2147424894
   2147424894
   2147424894
   2147424894
   2147424894
   2147424894
   2147424894
   2147424894
   2147424894
   2147424894
   2147424894
   2147424894
   2147424894
   2147424894
   9999999999
   ```
   
   Similar item sketch in the 3.2.0 generates the correct boundaries when the 
function `sketch.getQuantiles(40)` is used.
   ```
    0
   256423887
   512799757
   769177971
   1025548880
   1281929124
   1538304632
   1794673747
   2051315187
   2307669277
   2564042244
   2820409807
   3076787762
   3333187311
   3589563105
   3845931306
   4102575213
   4359042434
   4615417711
   4871794533
   5128171055
   5384552162
   5640921934
   5897299914
   6153942953
   6410313978
   6666657948
   6923038164
   7179411194
   7435787409
   7692168813
   7948547259
   8205184939
   8461560834
   8717940855
   8974345598
   9230745974
   9487159248
   9743589767
   9999999999
   ```
   
   Pseudo code used for experiments 
   
   ```
   ItemsSketch<Long> sketch = ItemsSketch.getInstance( 32768, 
Comparator.naturalOrder());
   
       for(long i=0;i<10_000_000_000L;i++){
         if(i%100_000_000==0){
           System.out.println("reached "+i);
         }
         sketch.update(i);
       }
   
   for(Long val:sketch.getQuantiles(40)){
         System.out.println(val);
       }
       
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to