[jira] [Commented] (FLINK-21389) ParquetInputFormat should not need parquet schema as user input

Etienne Chauchot (Jira) Fri, 19 Mar 2021 06:38:37 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-21389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17304901#comment-17304901
 ]


Etienne Chauchot commented on FLINK-21389:
------------------------------------------

[~ZhenqiuHuang] to avoid confusion for the users, I'll deprecate 
ParquetInputFormat constructor that takes the MessageType as parameter. Because 
otherwise they will not know what constructor to use and if they use the one 
with MessageType, they will be surprise to see that the schema is replaces late 
on in the pipeline (as described in the ticket description)

> ParquetInputFormat should not need parquet schema as user input
> ---------------------------------------------------------------
>
>                 Key: FLINK-21389
>                 URL: https://issues.apache.org/jira/browse/FLINK-21389
>             Project: Flink
>          Issue Type: Bug
>          Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>            Reporter: Etienne Chauchot
>            Assignee: Etienne Chauchot
>            Priority: Major
>
> _ParquetInputFormat_ takes parquet schema as user input but after split it 
> reads the parquet schema again here 
> [https://github.com/apache/flink/blob/52dcf439bb0b8d613fff1efecf015052d5b3a10b/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/ParquetInputFormat.java#L170]
>  it should read the provided user schema. 
>  But better would be to read the schema automatically and not require the 
> user to provide a schema as spark does 
> ([https://spark.apache.org/docs/latest/sql-data-sources-parquet.html]). 
>  Thus we could add a _ParquetInputFormat_ constructor and allow 
> _ParquetTableSource_ with no schema parameter



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-21389) ParquetInputFormat should not need parquet schema as user input

Reply via email to