[GitHub] [iceberg] JingsongLi opened a new pull request #1293: Flink: Implement Flink InputFormat and integrate it to FlinkCatalog

GitBox Tue, 04 Aug 2020 00:43:46 -0700


JingsongLi opened a new pull request #1293:
URL: https://github.com/apache/iceberg/pull/1293



   Fixes #1275
   
   This is Proof of Concept (POC) for Flink reader.
   
   The Flink reader is essentially the same as Spark.
   - Flink `InputFormat` is similar to Hive (Hadoop) input format. Its splits 
are generated in the job manager. Therefore, an iceberg catalog loader is 
needed to obtain the Iceberg `Table` object.
   - Flink `TableFactory` and `TableSource` are similar to Spark 
`TableProvider` and `SparkScanBuilder`, It also provides projection push down 
`ProjectableTableSource` and filter push down `FilterableTableSource`.
   
   Work can be divided into:
   - (Ongoing) Flink: Using RowData to avro reader and writer #1232
   - Introduce `IcebergCatalogLoader`.
   - Introduce `FlinkInputFormat`: implement SplitGenerator and RowDataReader.
   - Extract `TestAppendHelper` for testing.
   - Introduce `FlinkTableFactory` and `FlinkTableSource`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] JingsongLi opened a new pull request #1293: Flink: Implement Flink InputFormat and integrate it to FlinkCatalog

Reply via email to