[GitHub] [druid] gianm opened a new pull request, #13939: Lower default maxRowsInMemory for realtime ingestion to 100k.

via GitHub Wed, 15 Mar 2023 12:16:55 -0700


gianm opened a new pull request, #13939:
URL: https://github.com/apache/druid/pull/13939


   The thinking here is that for best ingestion throughput, we want 
intermediate persists to be as big as possible without using up all available 
memory. So, we rely mainly on maxBytesInMemory. The default maxRowsInMemory (1 
million) is really just a safety: in case we have a large number of very small 
rows, we don't want to get overwhelmed by per-row overheads.
   
   However, maximum ingestion throughput isn't necessarily the primary goal for 
realtime ingestion. Query performance is also important. And because query 
performance is not as good on the in-memory dataset, it's helpful to keep it 
from growing too large. 100k seems like a reasonable balance here. It means 
that for a typical 5 million row segment, we won't trigger more than 50 
persists due to this limit, which is a reasonable number of persists.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] gianm opened a new pull request, #13939: Lower default maxRowsInMemory for realtime ingestion to 100k.

Reply via email to