[
https://issues.apache.org/jira/browse/FLINK-21389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304901#comment-17304901
]
Etienne Chauchot commented on FLINK-21389:
------------------------------------------
[~ZhenqiuHuang] to avoid confusion for the users, I'll deprecate
ParquetInputFormat constructor that takes the MessageType as parameter. Because
otherwise they will not know what constructor to use and if they use the one
with MessageType, they will be surprise to see that the schema is replaces late
on in the pipeline (as described in the ticket description)
> ParquetInputFormat should not need parquet schema as user input
> ---------------------------------------------------------------
>
> Key: FLINK-21389
> URL: https://issues.apache.org/jira/browse/FLINK-21389
> Project: Flink
> Issue Type: Bug
> Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
> Reporter: Etienne Chauchot
> Assignee: Etienne Chauchot
> Priority: Major
>
> _ParquetInputFormat_ takes parquet schema as user input but after split it
> reads the parquet schema again here
> [https://github.com/apache/flink/blob/52dcf439bb0b8d613fff1efecf015052d5b3a10b/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/ParquetInputFormat.java#L170]
> it should read the provided user schema.
> But better would be to read the schema automatically and not require the
> user to provide a schema as spark does
> ([https://spark.apache.org/docs/latest/sql-data-sources-parquet.html]).
> Thus we could add a _ParquetInputFormat_ constructor and allow
> _ParquetTableSource_ with no schema parameter
--
This message was sent by Atlassian Jira
(v8.3.4#803005)