rajarshisarkar commented on PR #7194: URL: https://github.com/apache/iceberg/pull/7194#issuecomment-1490261743
> One of the problems with the proposed approach is that optimizations are being triggered as an immediate result of a commit. The implication is that whatever happens in the metric report consumer needs to happen in a way that doesn't affect the commit path. For example, failures in the consumer should not lead to commit failures. Additionally, every single commit triggers additional workload, so I think consuming a metrics report and actually performing some workload should be completely decoupled from one another. This is an opt-in feature and would be helpful in scenarios where the users would not like to maintain different optimisations as scheduled pipelines. This feature would actually take away the operational overhead from the users in terms of maintaining the extra pipelines. Yes, the consumer should not affect the commit path (for the incoming commits) which makes it suitable for batch workloads. Regarding additional workload, every commit would just do some basic threshold checks on the table history only when the user opts-in for auto optimisation. We can arrange the thresholds in way that the quickest threshold checks are done earlier so that we exit early, if possible. As the approach is suitable for batch workloads so the user shouldn't mind this latency after the commit. Thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
