In order to find out which class puts too much pressure on garbage collector, I captured flight record (60s) before and after upgrade and compare them. In our cluster, each node is constantly serving ~30 queries per second, so the workload should be the same before and after.
Here are the clues I found from the record. First, TLAB and non-TLAB allocation rate are 64MB/s and 284KB/s before upgrade, but 107MB/s and 100MB/s after upgrade. That is non-TLAB allocated memory is 362 times more. <img width="976" alt="allocation_rate_before" src="https://user-images.githubusercontent.com/1198446/44724021-31e35100-ab04-11e8-81d8-04057c744e5f.png"> <img width="976" alt="allocation_rate_after" src="https://user-images.githubusercontent.com/1198446/44724030-37d93200-ab04-11e8-9f3f-cdc6a0ada1eb.png"> Second, nearly 99% of the memory is allocated inside `RowBasedKeySerde`'s constructor. It's initializing forward and reverse dictionary. <img width="1076" alt="2018-08-28 9 02 07 pm" src="https://user-images.githubusercontent.com/1198446/44724736-2f81f680-ab06-11e8-898f-2d58848a0111.png"> >From [the >code](https://github.com/apache/incubator-druid/blob/0.12.0/processing/src/main/java/io/druid/query/groupby/epinephelinae/RowBasedGrouperHelper.java#L993) > we can see, the initial dictionary capacity is 10000. So every SpillGrouper >needs to allocate at least two 10000-sized dictionary. We have >`processing.numThreads` set to 31 and QPS around 30, therefore each second >we're creating 1860's 10000-sized dictionary. Most of our groupby queries >returns just a few rows, set initial dictionary size to 10000 is too big. This bug was introduced by #4707. Before 0.12, [initial dictionary is empty](https://github.com/apache/incubator-druid/blob/0.11.0/processing/src/main/java/io/druid/query/groupby/epinephelinae/RowBasedGrouperHelper.java#L907). #4707 changes it to 10000. [ Full content available at: https://github.com/apache/incubator-druid/issues/6255 ] This message was relayed via gitbox.apache.org for [email protected]
