[
https://issues.apache.org/jira/browse/FLINK-26311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503276#comment-17503276
]
Yun Gao commented on FLINK-26311:
---------------------------------
The issue is tested with the jobs under [1]. As a whole the csv reader
functionality works as expected in the following scenarios:
# Define a pojo class, parse and read the csv file
([CSVDataStreamPojoTest.java|https://github.com/gaoyunhaii/crossteam-1.15/blob/master/src/main/java/org/apache/flink/test/csv/CSVDataStreamPojoTest.java]).
# Define columns manually and use customized field separator, parse and read
the csv file
([CSVDataStreamJacksonTest.java|https://github.com/gaoyunhaii/crossteam-1.15/blob/master/src/main/java/org/apache/flink/test/csv/CSVDataStreamJacksonTest.java]).
# Read and write csv files using flink sql and csv format
([CSVSQLNormalFileTest.java|https://github.com/gaoyunhaii/crossteam-1.15/blob/master/src/main/java/org/apache/flink/test/csv/CSVSQLNormalFileTest.java]).
# Read and write spoiled csv files using flink sql and csv format that ignores
the parse error
([CSVSQLSpoiledFileTest.java|https://github.com/gaoyunhaii/crossteam-1.15/blob/master/src/main/java/org/apache/flink/test/csv/CSVSQLSpoiledFileTest.java]).
The result file shows that the reader will mark field as null for missing
fields and error fields, and skip rows with more fields than expected.
One possible improvement is that we might further enhance the document of using
csv format with DataStream [2] with
# It would be better to make the note that Pojo class need to marked with
@JsonProperty more eye-catching, like using {\{< hint info >}}. It should be
needed in almost all cases.
# It would be further better if we could have an end-to-end example showing
how to use pojo / customized schema definition to define the CsvReaderFormat.
It seems to be not too easy for users to figure out the usage of CsvMapper and
how to deduce the suitable TypeInformation.
[1]
[https://github.com/gaoyunhaii/crossteam-1.15/tree/master/src/main/java/org/apache/flink/test/csv]
[2]
https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/formats/csv/
> Test CsvFormat
> --------------
>
> Key: FLINK-26311
> URL: https://issues.apache.org/jira/browse/FLINK-26311
> Project: Flink
> Issue Type: Improvement
> Reporter: Alexander Fedulov
> Assignee: Yun Gao
> Priority: Blocker
> Labels: release-testing
> Fix For: 1.15.0
>
>
> https://issues.apache.org/jira/browse/FLINK-24703 adds new implementation for
> CSV format based on StreamFormat.
> The following needs to be tested:
> # Reading CSV with the DataStream API using FileSource based on POJO schema
> [1]
> # Reading CSV with the DataStream API using FileSource based on customized
> schema Jackson (i.e., non-default delimiter)[1]
> # Reading and writing CSV with SQL and 'filesystem' connector, including an
> option of skipping malformed rows [2]
>
> [1]
> [https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/formats/csv/]
> [2]
> [https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/formats/csv/]
>
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)