gianm opened a new pull request, #13939: URL: https://github.com/apache/druid/pull/13939
The thinking here is that for best ingestion throughput, we want intermediate persists to be as big as possible without using up all available memory. So, we rely mainly on maxBytesInMemory. The default maxRowsInMemory (1 million) is really just a safety: in case we have a large number of very small rows, we don't want to get overwhelmed by per-row overheads. However, maximum ingestion throughput isn't necessarily the primary goal for realtime ingestion. Query performance is also important. And because query performance is not as good on the in-memory dataset, it's helpful to keep it from growing too large. 100k seems like a reasonable balance here. It means that for a typical 5 million row segment, we won't trigger more than 50 persists due to this limit, which is a reasonable number of persists. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
