FangYongs opened a new pull request, #4350:
URL: https://github.com/apache/paimon/pull/4350

   ### Purpose
   
   Linked issue: close #xxx
   
   We use expected rows(5000000) and fpp(0.01) in bloom filter, the writer 
results are as follows:
   
   Records | Value Data Size | Without Bloom Filter(MS) | With Bloom Filter(MS) 
| Relative
   -- | -- | -- | -- | --
   100000 | 0B | 6 | 9 | 1.5
   100000 | 64B | 11 | 13 | 1.181818182
   100000 | 500B | 37 | 34 | 0.918918919
   100000 | 1000B | 57 | 58 | 1.01754386
   100000 | 2000B | 88 | 95 | 1.079545455
   1000000 | 0B | 32 | 64 | 2
   1000000 | 64B | 74 | 114 | 1.540540541
   1000000 | 500B | 253 | 310 | 1.225296443
   1000000 | 1000B | 384 | 440 | 1.145833333
   1000000 | 2000B | 534 | 625 | 1.170411985
   5000000 | 0B | 164 | 327 | 1.993902439
   5000000 | 64B | 356 | 511 | 1.435393258
   5000000 | 500B | 1072 | 1197 | 1.116604478
   5000000 | 1000B | 1921 | 2338 | 1.21707444
   5000000 | 2000B | 3163 | 3480 | 1.100221309
   10000000 | 0B | 344 | 733 | 2.130813953
   10000000 | 64B | 659 | 1220 | 1.851289833
   10000000 | 500B | 2815 | 2991 | 1.062522202
   10000000 | 1000B | 4914 | 5407 | 1.1003256
   10000000 | 2000B | 8646 | 9170 | 1.060606061
   15000000 | 0B | 517 | 1028 | 1.988394584
   15000000 | 64B | 1169 | 1718 | 1.469632164
   15000000 | 500B | 5435 | 5170 | 0.95124195
   15000000 | 1000B | 9962 | 10255 | 1.029411765
   15000000 | 2000B | 16203 | 18573 | 1.146269209
   
   The reader results which query data based on keys that are definitely stored 
are as follows and it indicates that the bloom filter basically does not cause 
performance degradation.
   Records | Value Data Size | Without Bloom Filter(MS) | With Bloom Filter(MS) 
| Relative
   -- | -- | -- | -- | --
   100000 | 0B | 799 | 807 | 1.010012516
   100000 | 64B | 205 | 193 | 0.941463415
   100000 | 500B | 171 | 140 | 0.81871345
   100000 | 1000B | 168 | 187 | 1.113095238
   100000 | 2000B | 170 | 173 | 1.017647059
   1000000 | 0B | 791 | 803 | 1.01517067
   1000000 | 64B | 221 | 211 | 0.954751131
   1000000 | 500B | 145 | 163 | 1.124137931
   1000000 | 1000B | 152 | 181 | 1.190789474
   1000000 | 2000B | 162 | 164 | 1.012345679
   5000000 | 0B | 789 | 778 | 0.986058302
   5000000 | 64B | 221 | 224 | 1.013574661
   5000000 | 500B | 175 | 161 | 0.92
   5000000 | 1000B | 178 | 181 | 1.016853933
   5000000 | 2000B | 186 | 186 | 1
   10000000 | 0B | 776 | 797 | 1.027061856
   10000000 | 64B | 191 | 201 | 1.052356021
   10000000 | 500B | 142 | 151 | 1.063380282
   10000000 | 1000B | 142 | 159 | 1.11971831
   10000000 | 2000B | 159 | 166 | 1.044025157
   15000000 | 0B | 820 | 800 | 0.975609756
   15000000 | 64B | 260 | 209 | 0.803846154
   15000000 | 500B | 139 | 151 | 1.086330935
   15000000 | 1000B | 149 | 155 | 1.040268456
   15000000 | 2000B | 157 | 164 | 1.044585987
   
   The reader results which query data based on keys that are definitely not 
stored are as follows and it indicates that the bloom filter can greatly 
improve performance.
   
   Records | Value Data Size | Without Bloom Filter(MS) | With Bloom Filter(MS) 
| Relative
   -- | -- | -- | -- | --
   100000 | 0B | 6 | 3 | 0.5
   100000 | 64B | 3 | 2 | 0.666666667
   100000 | 500B | 3 | 1 | 0.333333333
   100000 | 1000B | 4 | 2 | 0.5
   100000 | 2000B | 4 | 2 | 0.5
   1000000 | 0B | 3 | 2 | 0.666666667
   1000000 | 64B | 3 | 2 | 0.666666667
   1000000 | 500B | 3 | 2 | 0.666666667
   1000000 | 1000B | 4 | 2 | 0.5
   1000000 | 2000B | 6 | 3 | 0.5
   5000000 | 0B | 3 | 3 | 1
   5000000 | 64B | 3 | 2 | 0.666666667
   5000000 | 500B | 5 | 3 | 0.6
   5000000 | 1000B | 6 | 3 | 0.5
   5000000 | 2000B | 10 | 5 | 0.5
   10000000 | 0B | 4 | 3 | 0.75
   10000000 | 64B | 5 | 4 | 0.8
   10000000 | 500B | 6 | 4 | 0.666666667
   10000000 | 1000B | 7 | 5 | 0.714285714
   10000000 | 2000B | 11 | 8 | 0.727272727
   15000000 | 0B | 5 | 4 | 0.8
   15000000 | 64B | 5 | 4 | 0.8
   15000000 | 500B | 7 | 6 | 0.857142857
   15000000 | 1000B | 8 | 8 | 1
   15000000 | 2000B | 14 | 13 | 0.928571429
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to