tarun11Mavani opened a new issue, #16632:
URL: https://github.com/apache/pinot/issues/16632

   In #16344, we introduced a new feature to enable commit time compaction for 
upsert table. It removes the invalid records before committing the segment 
which means pre commit and post commit row count and segment size are 
different. Currently, all the segment flush threshold operates on the 
assumption that we commit all rows that we consume. 
   After this change, we have seen that we start committing segment much 
earlier even though we have added initial logic to estimate the pre-commit row 
for next segment based on pre-commit and post-commit row count and post-commit 
segment size of the old segment. The current logic is not robust to handle 
different compaction ratio (amount of repeat records for the same PK in a 
segment) and due to which size based threshold commits segment much faster. 
   
   We should introduce a new config called 
`realtime.segment.flush.threshold.precommit.segment.size` to allow user to set 
limit for pre-commit segment size instead of limiting them to use the current 
version which is specifically for post commit segment size.
   
   
   Note: This is a placeholder issue and I will work on adding more details and 
approach once the main PR is merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to