[jira] [Commented] (FLINK-21389) ParquetInputFormat should not need parquet schema as user input

Zhenqiu Huang (Jira) Wed, 17 Feb 2021 11:25:38 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-21389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17286103#comment-17286103
 ]


Zhenqiu Huang commented on FLINK-21389:
---------------------------------------

[~echauchot]
Thanks for reporting this. The context of having MessageType in construct is 
that we need to determine the TableSchema in ParquetTableSource. To prevent 
legacy planner to do file system read for schema validation, we decided to let 
user to provide a schema in advance. But you are right for ParquetInputFormat, 
we should have a constructor without the schema with parameter. 

> ParquetInputFormat should not need parquet schema as user input
> ---------------------------------------------------------------
>
>                 Key: FLINK-21389
>                 URL: https://issues.apache.org/jira/browse/FLINK-21389
>             Project: Flink
>          Issue Type: Improvement
>          Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
>            Reporter: Etienne Chauchot
>            Assignee: Etienne Chauchot
>            Priority: Major
>
> _ParquetInputFormat_ takes parquet schema as user input but after split it 
> reads the parquet schema again here 
> [https://github.com/apache/flink/blob/52dcf439bb0b8d613fff1efecf015052d5b3a10b/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/ParquetInputFormat.java#L170]
>  it should read the provided user schema. 
>  But better would be to read the schema automatically and not require the 
> user to provide a schema as spark does 
> ([https://spark.apache.org/docs/latest/sql-data-sources-parquet.html]). 
>  Thus we could add a _ParquetInputFormat_ constructor and allow 
> _ParquetTableSource_ with no schema parameter



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-21389) ParquetInputFormat should not need parquet schema as user input

Reply via email to