[jira] [Commented] (FLINK-26311) Test CsvFormat

Yun Gao (Jira) Tue, 08 Mar 2022 19:31:05 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-26311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503276#comment-17503276
 ]


Yun Gao commented on FLINK-26311:
---------------------------------

The issue is tested with the jobs under [1]. As a whole the csv reader 
functionality works as expected in the following scenarios:
 # Define a pojo class, parse and read the csv file  
([CSVDataStreamPojoTest.java|https://github.com/gaoyunhaii/crossteam-1.15/blob/master/src/main/java/org/apache/flink/test/csv/CSVDataStreamPojoTest.java]).
 
 # Define columns manually and use customized field separator, parse and read 
the csv file 
([CSVDataStreamJacksonTest.java|https://github.com/gaoyunhaii/crossteam-1.15/blob/master/src/main/java/org/apache/flink/test/csv/CSVDataStreamJacksonTest.java]).
 # Read and write csv files using flink sql and csv format 
([CSVSQLNormalFileTest.java|https://github.com/gaoyunhaii/crossteam-1.15/blob/master/src/main/java/org/apache/flink/test/csv/CSVSQLNormalFileTest.java]).
 # Read and write spoiled csv files using flink sql and csv format that ignores 
the parse error 
([CSVSQLSpoiledFileTest.java|https://github.com/gaoyunhaii/crossteam-1.15/blob/master/src/main/java/org/apache/flink/test/csv/CSVSQLSpoiledFileTest.java]).
 The result file shows that the reader will mark field as null for missing 
fields and error fields, and skip rows with more fields than expected.

One possible improvement is that we might further enhance the document of using 
csv format with DataStream [2] with
 # It would be better to make the note that Pojo class need to marked with 
@JsonProperty more eye-catching, like using {\{< hint info >}}. It should be 
needed in almost all cases.
 # It would be further better if we could have an end-to-end example showing 
how to use pojo / customized schema definition to define the CsvReaderFormat. 
It seems to be not too easy for users to figure out the usage of CsvMapper and 
how to deduce the suitable TypeInformation.  

 

[1] 
[https://github.com/gaoyunhaii/crossteam-1.15/tree/master/src/main/java/org/apache/flink/test/csv]

[2] 
https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/formats/csv/

> Test CsvFormat
> --------------
>
>                 Key: FLINK-26311
>                 URL: https://issues.apache.org/jira/browse/FLINK-26311
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: Alexander Fedulov
>            Assignee: Yun Gao
>            Priority: Blocker
>              Labels: release-testing
>             Fix For: 1.15.0
>
>
> https://issues.apache.org/jira/browse/FLINK-24703 adds new implementation for 
> CSV format based on StreamFormat. 
> The following needs to be tested:
>  # Reading CSV with the DataStream API using FileSource based on POJO schema 
> [1]
>  # Reading CSV with the DataStream API using FileSource based on customized 
> schema Jackson (i.e., non-default delimiter)[1]
>  # Reading and writing CSV with SQL and 'filesystem' connector, including an 
> option of skipping malformed rows [2]
>  
> [1] 
> [https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/formats/csv/]
> [2] 
> [https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/formats/csv/]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-26311) Test CsvFormat

Reply via email to