cryptoe commented on PR #14334:
URL: https://github.com/apache/druid/pull/14334#issuecomment-1774522282
The datasketch version 4.xx spitting out weird splits for an
`ItemSketch<Long>` when the function `sketch.getPartitionBoundaries(numSplits)`
When an item sketch with monotonically increasing number from 0 to 10
billion is chosen the distribution for 40 splits
```
0
250025928
500108450
750197955
1000017365
1250101575
1500191183
1750009306
2000098020
2147424894
2147424894
2147424894
2147424894
2147424894
2147424894
2147424894
2147424894
2147424894
2147424894
2147424894
2147424894
2147424894
2147424894
2147424894
2147424894
2147424894
2147424894
2147424894
2147424894
2147424894
2147424894
2147424894
2147424894
2147424894
2147424894
2147424894
2147424894
2147424894
2147424894
2147424894
9999999999
```
Similar item sketch in the 3.2.0 generates the correct boundaries when the
function `sketch.getQuantiles(40)` is used.
```
0
256423887
512799757
769177971
1025548880
1281929124
1538304632
1794673747
2051315187
2307669277
2564042244
2820409807
3076787762
3333187311
3589563105
3845931306
4102575213
4359042434
4615417711
4871794533
5128171055
5384552162
5640921934
5897299914
6153942953
6410313978
6666657948
6923038164
7179411194
7435787409
7692168813
7948547259
8205184939
8461560834
8717940855
8974345598
9230745974
9487159248
9743589767
9999999999
```
Pseudo code used for experiments
```
ItemsSketch<Long> sketch = ItemsSketch.getInstance( 32768,
Comparator.naturalOrder());
for(long i=0;i<10_000_000_000L;i++){
if(i%100_000_000==0){
System.out.println("reached "+i);
}
sketch.update(i);
}
for(Long val:sketch.getQuantiles(40)){
System.out.println(val);
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]