[GitHub] [incubator-iceberg] waterlx commented on issue #856: [WIP] Flink Iceberg sink

GitBox Tue, 21 Apr 2020 02:47:40 -0700


waterlx commented on issue #856:
URL: https://github.com/apache/incubator-iceberg/pull/856#issuecomment-617074928



   @rdblue Thanks for sharing your thoughts on the reasons why Table/BaseTable 
is not serializable. Totally agree. But I am currently in the dilemma where 
there might be a need to call high-level operations in Flink tasks,  like 
table.newTransaction() when trying to cimmit DataFiles accumulated from the 
streaming inputs. Currently code limits the parallelism to 1 so that the commit 
won't be performed in parallel.  
   
   For now, it is not a blocker because I could pass the namespace and table 
name by config and call loadTable() of Catalog to build the table when there is 
a need. But the implementation is not that good as the table informations(like 
namespace, table name, it is HiveCatalog or HadoopCatalog) passes eveywhere, 
while some of them are not needed.
   
   I am also thinking about passing the path as a string (db.table for Hive 
Catalog while full qualified path for HadoopTables) instead of passing table 
instance, but the purpose is also to re-build the table instance so as to call 
some high-level operations.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] waterlx commented on issue #856: [WIP] Flink Iceberg sink

Reply via email to