owen6314 commented on PR #14594: URL: https://github.com/apache/iceberg/pull/14594#issuecomment-3550875654
> > This field is derived dynamically within the Flink app (as it depends on the data flowing through the app) > > How do you send the dynamically collected data to the `func`? The function is serialized during the job creation, and then sent to the committer. When the committer is started, then the function is deserialized and executed, but only already serialized information, and information available in the committer could be used by the function. Great point. In my previous tests, I stored the min/max event time in a static variable and accessed it within `func`. Since the operator runs with `parallelism=1`, in the testing environment, both the operator and the committer were always scheduled in the same JVM and this made the static state accessible. But now I realized this might not be guaranteed. Given these constraints, I’d like to take a step back and research the best way to propagate necessary metadata to the committer, so we can reliably set dynamic snapshot properties at checkpoint time. Will update the PR if I have a more robust solution. Thank you again for your thoughtful feedback, and any additional insights are very welcome. Our use case (#14160 ) of collecting custom metrics for each snapshot and recording them as snapshot properties seems fairly common, so I’m hoping there’s a Flink-Iceberg-native approach we can adopt. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
