yinghao-wang opened a new issue #9920:
URL: https://github.com/apache/druid/issues/9920


   ### Affected Version  
   Druid 0.13.0 ~ Druid 0.18.0
   
   Pull down druid0.13 and druid0.18 version from the community. After 
compiling, run their respective benchmarks (The command is java -jar 
benchmarks.jar BitmapIterationBenchmark -p bitmapAlgo = "roaring") on 
roaringbitmap. It is found that the druid0.13 version can be run within ten 
minutes. The druid0.18 version requires More than an hour, and the results ran 
out not as expected. There are two cases of Druid 0.18 version which is much 
slower than Druid 0.13 version.
   
   The roaring bitmap used by the druid 0.13 version is version 0.5.18, and the 
roaring bitmap used by the druid 0.18 version is version 0.8.11.
   All data below comes from mac, 6 CPUs, 16GB memory.
   I have also tried it on a server with 48 CPUs and 250G memory, and the 
results are similar, indicating that the test results are not related to large 
machines
   
   Benchmark test results of druid 0.13 version:
   
![image](https://user-images.githubusercontent.com/64965834/82751063-40e55900-9de7-11ea-84bd-a3799e8fe0a1.png)
   Benchmark test results of druid 0.18 version:
   
![image](https://user-images.githubusercontent.com/64965834/82799217-45277a00-9eac-11ea-944f-952c2575da8a.png)
   
   
![image](https://user-images.githubusercontent.com/64965834/82817382-7c5b5280-9ecf-11ea-9a07-aa7abc926d2f.png)
   
   The above benchmarks have been tested many times and the results are similar.
   
   Below is my troubleshooting process:
   **Phenomenon 1: The total time of the benchmark becomes longer**
   Observed that the execution of the benchmark from 0.15 to 0.16 slowed down 
(not sure why), running bitmapAlgo = "roaring" -pn = 100 -p prob = 0.1 alone, 
the benchmark time increased from 40s (0.15) to 6min (0.16)
   
![image](https://user-images.githubusercontent.com/64965834/82801003-3098b100-9eaf-11ea-8e70-5dca9a072750.png)
   **Phenomenon 2: Average time of two specific cases becomes larger**
   In the two intersectionAndIter cases (n = 100, prob = 0.1) (n = 100, prob = 
0.5) of 0.13 ~ 0.14, the sorce increased obviously, from 300w ~ 500w (0.13) to 
100 ~ 200 million (0.14)
   case 1(n=100.prob=0.1)
   
![image](https://user-images.githubusercontent.com/64965834/82802132-f9c39a80-9eb0-11ea-9870-3ec90caa49a0.png)
   
   case 2(n=100.prob=0.5)
   
![image](https://user-images.githubusercontent.com/64965834/82802093-ea445180-9eb0-11ea-8098-8d1beb6f6e93.png)
   
   Although the phenomenon 1 will slow down the benchmark execution, I don't 
think it will affect the performance. I think what really affects the 
performance is the sorce value of the two cases in the phenomenon 2, so I will 
try to analyze the reason why the sorce value of the two cases rises.
   
   **From the test data analysis phenomenon 2 reasons for the rise of Sorce:**
   Based on the Druid 0.13 version code, I replaced the roaring bitmap version, 
and the test results obtained are recorded as follows:
   
![image](https://user-images.githubusercontent.com/64965834/82816795-600ae600-9ece-11ea-9088-fd37d5cc3930.png)
   
   
![image](https://user-images.githubusercontent.com/64965834/82816455-c0e5ee80-9ecd-11ea-9052-21487b2bc856.png)
   
   Observed from the data in the table, after the code change of # 6764 in the 
community is completed, the sorce of the above two scenes rises significantly.
   **Analyze the reason for the rise of Sorce from the flame diagram:**
   Modified the benchmark code to only run the two scenes that are slower in 
the above figure (prob = 0.1, 0.5), run multiple times in the Druid0.13 and 
Druid0.14 and grab the flame chart as follows (because the fork of the 
benchmark is set to 1 , So every time a case is run, a child process is forkd 
out to run, so the flame graph not only captures the main process, but also 
captures the child process of the specific scene):
   Druid0.13 benchmark main process
   
![image](https://user-images.githubusercontent.com/64965834/82813307-5e89ef80-9ec7-11ea-90fc-3029be9ee8b7.png)
   Druid0.13 child process: intersectionAndIter when prob = 0.1
   
![image](https://user-images.githubusercontent.com/64965834/82813811-6ac27c80-9ec8-11ea-9709-04c16b8b722d.png)
   Druid0.13 child process: intersectionAndIter when prob = 0.5
   
![image](https://user-images.githubusercontent.com/64965834/82813917-a5c4b000-9ec8-11ea-9a21-0db0361d5e3b.png)
   Druid0.14 benchmark main process
   
![image](https://user-images.githubusercontent.com/64965834/82814015-d99fd580-9ec8-11ea-9f47-bda89d204a12.png)
   Druid0.14 child process: intersectionAndIter when prob = 0.1
   
![image](https://user-images.githubusercontent.com/64965834/82814071-f89e6780-9ec8-11ea-8dff-35c99efe6359.png)
   Druid0.14 child process: intersectionAndIter when prob = 0.5
   
![image](https://user-images.githubusercontent.com/64965834/82814121-0f44be80-9ec9-11ea-9e7d-baf821f1da52.png)
   Druid0.18 child process: intersectionAndIter when prob = 0.1
   
![image](https://user-images.githubusercontent.com/64965834/82814178-2edbe700-9ec9-11ea-9512-821a50deb4e3.png)
   Druid0.18 child process: intersectionAndIter when prob = 0.5
   
![image](https://user-images.githubusercontent.com/64965834/82814209-3e5b3000-9ec9-11ea-9e5f-162f5ffc3372.png)
   
   The above flame chart has been grabbed multiple times, one of which was 
randomly taken
   From the above flame chart, it is observed that different versions of Druid 
use different Containers to take intersections in roaring bitmap. Druid version 
0.13 uses ArrayContainer or bitmapContainer, but Druid 0.14 and later versions 
use RunContainer (because of the grabbed Both 0.14 and 0.18 flame charts use 
RunContainer, so it is speculated that the versions from 0.14 to 0.18 are all 
used RunContainer)
   I think that RunContainer's compression efficiency under this benchmark is 
not good, because looking at the construction code of fake data, I found that 
the distribution of fake data is very sparse and the values ​​are not 
continuous.
   
   Code to construct fake data:
   
![image](https://user-images.githubusercontent.com/64965834/82814978-ca218c00-9eca-11ea-9ffb-7ae5d5c4a3e8.png)
   
![image](https://user-images.githubusercontent.com/64965834/82815167-33090400-9ecb-11ea-9158-81c55485bf85.png)
   
   Observing the setup code, it is found that the initialized bitmap is very 
sparse and the values ​​are not continuous. In this scenario, I think 
RunContainer does not have the high compression efficiency of BitmapContainer 
and ArrayContainer, which may cause some performance problems.
   
   
   **Conclusion: The code change of # 6764 in the community may be related to 
the use of RunContainer when the Roaring bitmap benchmark is constructed. It is 
necessary to further observe the impact of the # 6764 code to analyze,I hope 
everyone can help to see some of these problems, the above is just my personal 
analysis**


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to