bryanck commented on PR #8555: URL: https://github.com/apache/iceberg/pull/8555#issuecomment-1718003552
> I am very concerned about this simplified solution. We should consider: > > * Concurrency issues, when the table metadata has been changed between reloads. Maybe we should only refresh the table metadata during checpoints, so we can localize, and handle changes in one place? > * Load issues, as every TaskManager with a writer task connects to the catalog, and does so periodically. This will become costly very soon. I would prioritize a centralized solution. > * The current default implementation does not solve the schema evolution problem, since that would mean on-demand table reload instead of periodic reload > > If the main goal here is to refresh the credentials, then maybe a proxy solution above the accessor could be a better solution, like the https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/RetryingMetaStoreClient.java where we get a new Kerberos token when the previous one has expired. > > What do you think? To address your points: * This PR was updated so the committer will now only reload table at checkpoints. The committer was already updating the table metadata as part of the Iceberg commit and upon restart. Writers will always use the original table metadata and only use the reloaded table to get an updated FileIO, though the hope is that the writers will used the reloaded metadata in the future. * This PR was designed so that a different and more optimal table supplier solution could be plugged in later. There were limitations with the options we explored for centralized table reload, such as using the new token delegation framework or using a broadcast (I'd be happy to discuss details if you're interested). We went with this initial solution as a starting point. This feature is disabled by default and marked as experimental, so should only be used in cases where it is known that a job will not overburden the catalog with load requests. * This PR was not meant to solve schema evolution problems but rather make a change that will take a step towards that long term goal. * The short term goal for our use case was to refresh credentials, so we initially explored adding a FileIO-specific refresh. However, the long term goal is to support table reload in a running Flink job to support schema evolution. We felt that adding in the table refresh capability would be a better step towards reaching that goal, and it could also be used to solve the particular issue we are having. cc @danielcweeks @rdblue in case you have additional perspective or feedback... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
