waterlx commented on issue #856: URL: https://github.com/apache/incubator-iceberg/pull/856#issuecomment-617074928
@rdblue Thanks for sharing your thoughts on the reasons why Table/BaseTable is not serializable. Totally agree. But I am currently in the dilemma where there might be a need to call high-level operations in Flink tasks, like table.newTransaction() when trying to cimmit DataFiles accumulated from the streaming inputs. Currently code limits the parallelism to 1 so that the commit won't be performed in parallel. For now, it is not a blocker because I could pass the namespace and table name by config and call loadTable() of Catalog to build the table when there is a need. But the implementation is not that good as the table informations(like namespace, table name, it is HiveCatalog or HadoopCatalog) passes eveywhere, while some of them are not needed. I am also thinking about passing the path as a string (db.table for Hive Catalog while full qualified path for HadoopTables) instead of passing table instance, but the purpose is also to re-build the table instance so as to call some high-level operations. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org