[ 
https://issues.apache.org/jira/browse/FLINK-21562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17293825#comment-17293825
 ] 

Nico Kruber commented on FLINK-21562:
-------------------------------------

A similar situation can arise if you have a sparse CSV file with "null" values 
that you didn't account for yet. In that scenario, the user is also left alone 
figuring out where the empty string is and whether it is an error in the file 
or whether the table DDL needs extensions.

> Add more informative message on CSV parsing errors
> --------------------------------------------------
>
>                 Key: FLINK-21562
>                 URL: https://issues.apache.org/jira/browse/FLINK-21562
>             Project: Flink
>          Issue Type: Improvement
>          Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
> SQL / API
>    Affects Versions: 1.11.3
>            Reporter: Nico Kruber
>            Priority: Major
>
> I was parsing a CSV file with comments in it and used {{'csv.allow-comments' 
> = 'true'}} without also passing {{'csv.ignore-parse-errors' = 'true'}} to the 
> table DDL to not hide any actual format errors.
> Since I didn't just have strings in my table, this did of course stumble on 
> the commented-out line with the following error:
> {code}
> 2021-02-16 17:45:53,055 WARN  org.apache.flink.runtime.taskmanager.Task       
>              [] - Source: TableSourceScan(table=[[default_catalog, 
> default_database, airports]], fields=[IATA_CODE, AIRPORT, CITY, STATE, 
> COUNTRY, LATITUDE, LONGITUDE]) -> SinkConversionToTuple2 -> Sink: SQL Client 
> Stream Collect Sink (1/1)#0 (9f3a3965f18ed99ee42580bdb559ba66) switched from 
> RUNNING to FAILED.
> java.io.IOException: Failed to deserialize CSV row.
>       at 
> org.apache.flink.formats.csv.CsvFileSystemFormatFactory$CsvInputFormat.nextRecord(CsvFileSystemFormatFactory.java:257)
>  ~[flink-csv-1.12.1.jar:1.12.1]
>       at 
> org.apache.flink.formats.csv.CsvFileSystemFormatFactory$CsvInputFormat.nextRecord(CsvFileSystemFormatFactory.java:162)
>  ~[flink-csv-1.12.1.jar:1.12.1]
>       at 
> org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:90)
>  ~[flink-dist_2.12-1.12.1.jar:1.12.1]
>       at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110)
>  ~[flink-dist_2.12-1.12.1.jar:1.12.1]
>       at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:66)
>  ~[flink-dist_2.12-1.12.1.jar:1.12.1]
>       at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:241)
>  ~[flink-dist_2.12-1.12.1.jar:1.12.1]
> Caused by: java.lang.NumberFormatException: empty String
>       at 
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1842) 
> ~[?:1.8.0_275]
>       at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110) 
> ~[?:1.8.0_275]
>       at java.lang.Double.parseDouble(Double.java:538) ~[?:1.8.0_275]
>       at 
> org.apache.flink.formats.csv.CsvToRowDataConverters.convertToDouble(CsvToRowDataConverters.java:203)
>  ~[flink-csv-1.12.1.jar:1.12.1]
>       at 
> org.apache.flink.formats.csv.CsvToRowDataConverters.lambda$createNullableConverter$ac6e531e$1(CsvToRowDataConverters.java:113)
>  ~[flink-csv-1.12.1.jar:1.12.1]
>       at 
> org.apache.flink.formats.csv.CsvToRowDataConverters.lambda$createRowConverter$18bb1dd$1(CsvToRowDataConverters.java:98)
>  ~[flink-csv-1.12.1.jar:1.12.1]
>       at 
> org.apache.flink.formats.csv.CsvFileSystemFormatFactory$CsvInputFormat.nextRecord(CsvFileSystemFormatFactory.java:251)
>  ~[flink-csv-1.12.1.jar:1.12.1]
>       ... 5 more
> {code}
> Two things should be improved here:
> # commented-out lines should be ignored by default (potentially, FLINK-17133 
> addresses this or at least gives the user the power to do so)
> # the error message itself is not very informative: "empty String".
> This ticket is about the latter. I would suggest to have at least a few more 
> pointers to direct the user to finding the source in the CSV file/item/... - 
> here, the data type could just be wrong or the CSV file itself may be 
> wrong/corrupted and the user would need to investigate.
> What exactly may help here, probably depends on the actual input connector 
> this format is currently working with, e.g. line number in a csv file would 
> be best, otherwise that may not be possible but we could show the whole line 
> or at least a few surrounding fields...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to