[
https://issues.apache.org/jira/browse/FLINK-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14394853#comment-14394853
]
Fabian Hueske commented on FLINK-1820:
--------------------------------------
I would not call this a bug.
The behavior that you (or your program) expects might be different from what
other users would like the parsers to behave like. I would find it surprising
that an empty string results in value 0 (why not -1 or 42?) and rather expect
either an exception or a NaN "value".
Also changing the default behavior breaks the API (other users might rely on
the current behavior).
I am also not sure, if we should add another parameter to the CsvInputFormats
to configure the floating point parsers. The formats have already quite a few
parameters and I think it is not a good idea to add more parameters for all
possible parser behaviors. Instead, we could allow to configure user-defined
parsers for specific fields.
A workaround for your usecase could be to read the possible empty field as a
String field and convert the String to a Double or Float in a subsequent Mapper.
> Bug in DoubleParser and FloatParser - empty String is not casted to 0
> ---------------------------------------------------------------------
>
> Key: FLINK-1820
> URL: https://issues.apache.org/jira/browse/FLINK-1820
> Project: Flink
> Issue Type: Bug
> Components: core
> Affects Versions: 0.8.0, 0.9, 0.8.1
> Reporter: Felix Neutatz
> Assignee: Felix Neutatz
> Priority: Critical
> Fix For: 0.9
>
>
> Hi,
> I found the bug, when I wanted to read a csv file, which had a line like:
> "||\n"
> If I treat it as a Tuple2<Long,Long>, I get as expected a tuple (0L,0L).
> But if I want to read it into a Double-Tuple or a Float-Tuple, I get the
> following error:
> java.lang.AssertionError: Test failed due to a
> org.apache.flink.api.common.io.ParseException: Line could not be parsed: '||'
> ParserError NUMERIC_VALUE_FORMAT_ERROR
> This error can be solved by adding an additional condition for empty strings
> in the FloatParser / DoubleParser.
> We definitely need the CSVReader to be able to read "empty values".
> I can fix it like described if there are no better ideas :)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)