owen6314 commented on PR #14594:
URL: https://github.com/apache/iceberg/pull/14594#issuecomment-3550875654

   > > This field is derived dynamically within the Flink app (as it depends on 
the data flowing through the app)
   > 
   > How do you send the dynamically collected data to the `func`? The function 
is serialized during the job creation, and then sent to the committer. When the 
committer is started, then the function is deserialized and executed, but only 
already serialized information, and information available in the committer 
could be used by the function.
   
   Great point. In my previous tests, I stored the min/max event time in a 
static variable and accessed it within `func`. Since the operator runs with 
`parallelism=1`, in the testing environment, both the operator and the 
committer were always scheduled in the same JVM and this made the static state 
accessible. But now I realized this might not be guaranteed.
   
   Given these constraints, I’d like to take a step back and research the best 
way to propagate necessary metadata to the committer, so we can reliably set 
dynamic snapshot properties at checkpoint time. Will update the PR if I have a 
more robust solution. Thank you again for your thoughtful feedback, and any 
additional insights are very welcome. Our use case (#14160 ) of collecting 
custom metrics for each snapshot and recording them as snapshot properties 
seems fairly common, so I’m hoping there’s a Flink-Iceberg-native approach we 
can adopt.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to