tarun11Mavani opened a new issue, #16632: URL: https://github.com/apache/pinot/issues/16632
In #16344, we introduced a new feature to enable commit time compaction for upsert table. It removes the invalid records before committing the segment which means pre commit and post commit row count and segment size are different. Currently, all the segment flush threshold operates on the assumption that we commit all rows that we consume. After this change, we have seen that we start committing segment much earlier even though we have added initial logic to estimate the pre-commit row for next segment based on pre-commit and post-commit row count and post-commit segment size of the old segment. The current logic is not robust to handle different compaction ratio (amount of repeat records for the same PK in a segment) and due to which size based threshold commits segment much faster. We should introduce a new config called `realtime.segment.flush.threshold.precommit.segment.size` to allow user to set limit for pre-commit segment size instead of limiting them to use the current version which is specifically for post commit segment size. Note: This is a placeholder issue and I will work on adding more details and approach once the main PR is merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
