wypoon commented on PR #5742: URL: https://github.com/apache/iceberg/pull/5742#issuecomment-1254083288
> Hi @wypoon, thanks for the PR. I don't see a strong reason to expose the threshold to users. Instead, it's better to hide it from users. Here are reasons: > > 1. It is an internal threshold that user doesn't have to understand, and probably don't want to understand. > 2. We can potentially remove it in the future if possible. We use to discuss that here [Core: Replace Set with Bitmap to make delete filtering simpler and fasterĀ #3535 (comment)](https://github.com/apache/iceberg/pull/3535#issuecomment-996355892), it is not valid at that time though. > 3. We can adjust the value according to internal implementation. For example, we can increase the threshold when we use more efficient data structure to store pos delete rows. > > What do you think? I don't have a strong opinion on whether to expose this threshold to the user. We do expose various optimizations to the user, with sensible defaults, so users who are not interested or have no need to tune them do not need to. So even though this particular setting may not be of interest to most users, I don't see much harm in it. My main interest, though, is in allowing a way to set this threshold easily for testing the code path I mention. If you have a good suggestion for another way to set the threshold, I'm happy to consider it. A hacky way would be to allow the threshold to be set in `DeleteFilter` by a system property, and to set and unset the property in the test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
