AHeise commented on pull request #15156:
URL: https://github.com/apache/flink/pull/15156#issuecomment-847655919


   > @AHeise ParquetInputFormat base class was removed since I submitted my PR 
hence the compilation issues, commit 
[ce3631a](https://github.com/apache/flink/commit/ce3631af7313855f675e29b8faa386f6e5a2d43c)
 removed it. This commit mentions "Use the filesystem connector with a Parquet 
format as a replacement". I guess it refers to 
https://ci.apache.org/projects/flink/flink-docs-master/docs/connectors/table/filesystem/
 which is SQL based. But what if our pipeline pipeline does not use SQL but 
DataSet API ?
   
   It's a good and tough question. I spoke to @twalthr offline and for now the 
plan is as follows:
   * Table API drops old planner and thus, most of the removed code in that 
commit is dead code.
   * Table API will support everything running on DataStream that used to work 
when it ran on DataSet.
   * 1.13 will be the last release with full DataSet support 
(BatchTableEnvironment will be dropped)
   
   Since a few features of DataSet are still not supported in DataStream, we 
expect users to stick to 1.13 for a longer time and probably skip 1.14. So, 
we'd suggest to merge your 2 PRs to release-1.13 instead of master. Then, all 
DataSet users would benefit from your contributions while we unblock future 
developments that would break DataSet as of 1.13.
   
   If it turns out that the community wants to have these features in 1.14 for 
some reasons (Table API not as far as it should), we can still re-add 
`ParquetInputFormat` and forward port your PR before feature freeze in 3 months.
   
   Note 1: It might still be possible to have a Flink 1.14 DataSet application 
using 1.13 formats. In general, it's always possible to copy the old code into 
your own project.
   Note 2: If you are missing combineable aggregations in DataStream, maybe it 
would be better to move to Table API to begin with. At this point, no-one 
really knows how much of DataSet will be ported to DataStream. It doesn't 
really make sense to have 2 APIs with high-level primitives like joins. The 
main idea is to use Table API by default with a rich user experience and go 
down to DataStream only when needed (timer, user state, ...).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to