AHeise commented on pull request #15156: URL: https://github.com/apache/flink/pull/15156#issuecomment-847655919
> @AHeise ParquetInputFormat base class was removed since I submitted my PR hence the compilation issues, commit [ce3631a](https://github.com/apache/flink/commit/ce3631af7313855f675e29b8faa386f6e5a2d43c) removed it. This commit mentions "Use the filesystem connector with a Parquet format as a replacement". I guess it refers to https://ci.apache.org/projects/flink/flink-docs-master/docs/connectors/table/filesystem/ which is SQL based. But what if our pipeline pipeline does not use SQL but DataSet API ? It's a good and tough question. I spoke to @twalthr offline and for now the plan is as follows: * Table API drops old planner and thus, most of the removed code in that commit is dead code. * Table API will support everything running on DataStream that used to work when it ran on DataSet. * 1.13 will be the last release with full DataSet support (BatchTableEnvironment will be dropped) Since a few features of DataSet are still not supported in DataStream, we expect users to stick to 1.13 for a longer time and probably skip 1.14. So, we'd suggest to merge your 2 PRs to release-1.13 instead of master. Then, all DataSet users would benefit from your contributions while we unblock future developments that would break DataSet as of 1.13. If it turns out that the community wants to have these features in 1.14 for some reasons (Table API not as far as it should), we can still re-add `ParquetInputFormat` and forward port your PR before feature freeze in 3 months. Note 1: It might still be possible to have a Flink 1.14 DataSet application using 1.13 formats. In general, it's always possible to copy the old code into your own project. Note 2: If you are missing combineable aggregations in DataStream, maybe it would be better to move to Table API to begin with. At this point, no-one really knows how much of DataSet will be ported to DataStream. It doesn't really make sense to have 2 APIs with high-level primitives like joins. The main idea is to use Table API by default with a rich user experience and go down to DataStream only when needed (timer, user state, ...). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
