[GitHub] [flink] echauchot edited a comment on pull request #15156: [FLINK-21393] [formats] Implement ParquetAvroInputFormat

GitBox Wed, 19 May 2021 06:04:55 -0700


echauchot edited a comment on pull request #15156:
URL: https://github.com/apache/flink/pull/15156#issuecomment-833481523



   @JingsongLi , echoing what I sent to the Flink ML
   
   I agree that adding a new feature (ParquetAvroInputFormat) to an old source 
API is a maintenance burden. But IMHO I think that while the new DataStream 
batch/streaming convergent API is not 100% functional we still need to maintain 
older sources and add missing features to them.
   
   Indeed, I realized that DataStream API in batch mode (1) does not support 
aggregations yet (2) so in such a case a user would stick to the DataSet API. 
And the new FileSource API with ParquetColumnarRowInputFormat is not available 
in DataSet API (3).
   
   So, long story short, in some cases a user will have no other choice than 
using ParquetInputFormat and legacy source.
   
   WDYT ?
   
   [1] https://issues.apache.org/jira/browse/FLINK-19316
   
   [2] https://issues.apache.org/jira/browse/FLINK-22587
   
   [3] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface#FLIP27:RefactorSourceInterface-Compatibility,Deprecation,andMigrationPlan
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] echauchot edited a comment on pull request #15156: [FLINK-21393] [formats] Implement ParquetAvroInputFormat

Reply via email to