echauchot edited a comment on pull request #15156: URL: https://github.com/apache/flink/pull/15156#issuecomment-833481523
@JingsongLi , echoing what I sent to the Flink ML I agree that adding a new feature (ParquetAvroInputFormat) to an old source API is a maintenance burden. But IMHO I think that while the new DataStream batch/streaming convergent API is not 100% functional we still need to maintain older sources and add missing features to them. Indeed, I realized that DataStream API in batch mode (1) does not support aggregations yet (2) so in such a case a user would stick to the DataSet API. And the new FileSource API with ParquetColumnarRowInputFormat is not available in DataSet API (3). So, long story short, in some cases a user will have no other choice than using ParquetInputFormat and legacy source. WDYT ? [1] https://issues.apache.org/jira/browse/FLINK-19316 [2] https://issues.apache.org/jira/browse/FLINK-22587 [3] https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface#FLIP27:RefactorSourceInterface-Compatibility,Deprecation,andMigrationPlan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
