liran-funaro commented on issue #7900:
URL: https://github.com/apache/druid/issues/7900#issuecomment-643750934


   I realize that I'm a little late for the party since `CliIndexer` is already 
merged, but I just want to raise a possible issue with this design.
   
   Once many concurrent incremental-indexes will be processed on the same JVM 
heap, the number of the long-lived objects will be larger than any of the 
individual Peons.
   Unfuretntly, the JVM does not handle well workloads with a huge number of 
long-lived objects.
   This evidently causes long pause times for each GC cycle that can add up to 
up to 50% of the process runtime.
   However, the value of using the `CliIndexer`, IMO, is great.
   
   To solve this, I suggest storing all incremental index data (keys and 
values) off-heap, which will reduce the number of heap objects dramatically.
   Please, check out my issue (#9967) and PR (#10001) that solves exactly this 
problem.
   
   This solution improves the CPU and RAM utilization of the batch ingestion by 
over 50% in both serial and parallel ingestion modes, and might greatly improve 
the resource utilization and performance of the ingestion using the 
`CliIndexer`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to