[ 
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14394853#comment-14394853
 ] 

Fabian Hueske commented on FLINK-1820:
--------------------------------------

I would not call this a bug. 
The behavior that you (or your program) expects might be different from what 
other users would like the parsers to behave like. I would find it surprising 
that an empty string results in value 0 (why not -1 or 42?) and rather expect 
either an exception or a NaN "value".
Also changing the default behavior breaks the API (other users might rely on 
the current behavior).

I am also not sure, if we should add another parameter to the CsvInputFormats 
to configure the floating point parsers. The formats have already quite a few 
parameters and I think it is not a good idea to add more parameters for all 
possible parser behaviors. Instead, we could allow to configure user-defined 
parsers for specific fields.

A workaround for your usecase could be to read the possible empty field as a 
String field and convert the String to a Double or Float in a subsequent Mapper.

> Bug in DoubleParser and FloatParser - empty String is not casted to 0
> ---------------------------------------------------------------------
>
>                 Key: FLINK-1820
>                 URL: https://issues.apache.org/jira/browse/FLINK-1820
>             Project: Flink
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8.0, 0.9, 0.8.1
>            Reporter: Felix Neutatz
>            Assignee: Felix Neutatz
>            Priority: Critical
>             Fix For: 0.9
>
>
> Hi,
> I found the bug, when I wanted to read a csv file, which had a line like:
> "||\n"
> If I treat it as a Tuple2<Long,Long>, I get as expected a tuple (0L,0L).
> But if I want to read it into a Double-Tuple or a Float-Tuple, I get the 
> following error:
> java.lang.AssertionError: Test failed due to a 
> org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
> ParserError NUMERIC_VALUE_FORMAT_ERROR 
> This error can be solved by adding an additional condition for empty strings 
> in the FloatParser / DoubleParser.
> We definitely need the CSVReader to be able to read "empty values".
> I can fix it like described if there are no better ideas :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to