[
https://issues.apache.org/jira/browse/FLINK-21389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304901#comment-17304901
]
Etienne Chauchot edited comment on FLINK-21389 at 3/19/21, 3:18 PM:
--------------------------------------------------------------------
[~ZhenqiuHuang] to avoid confusion for the users, I'll deprecate
ParquetInputFormat constructor that takes the MessageType as parameter. Because
otherwise they will not know what constructor to use and if they use the one
with MessageType, they will be surprised to see that the schema is replaced
later on in the pipeline (as described in the ticket description)
was (Author: echauchot):
[~ZhenqiuHuang] to avoid confusion for the users, I'll deprecate
ParquetInputFormat constructor that takes the MessageType as parameter. Because
otherwise they will not know what constructor to use and if they use the one
with MessageType, they will be surprise to see that the schema is replaces late
on in the pipeline (as described in the ticket description)
> ParquetInputFormat should not need parquet schema as user input
> ---------------------------------------------------------------
>
> Key: FLINK-21389
> URL: https://issues.apache.org/jira/browse/FLINK-21389
> Project: Flink
> Issue Type: Bug
> Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
> Reporter: Etienne Chauchot
> Assignee: Etienne Chauchot
> Priority: Major
>
> _ParquetInputFormat_ takes parquet schema as user input but after split it
> reads the parquet schema again here
> [https://github.com/apache/flink/blob/52dcf439bb0b8d613fff1efecf015052d5b3a10b/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/ParquetInputFormat.java#L170]
> it should read the provided user schema.
> But better would be to read the schema automatically and not require the
> user to provide a schema as spark does
> ([https://spark.apache.org/docs/latest/sql-data-sources-parquet.html]).
> Thus we could add a _ParquetInputFormat_ constructor and allow
> _ParquetTableSource_ with no schema parameter
--
This message was sent by Atlassian Jira
(v8.3.4#803005)