[
https://issues.apache.org/jira/browse/SPARK-21263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074109#comment-16074109
]
Hossein Falaki commented on SPARK-21263:
----------------------------------------
[~sowen] note that user specified the mode to be PERMISSIVE. In this mode CSV
data source will try to ignore errors and return some result. If the mode is
FAILFAST, it should throw an exception. I see the permissiveness of different
modes as follows:
{code}
PERMISSIVE > DROPMALFORMED > FAILFAST
{code}
Here we have different behavior for {{IntegerType}} vs. {{DoubleType}}. That
needs to be fixed and behavior should be consistent.
> NumberFormatException is not thrown while converting an invalid string to
> float/double
> --------------------------------------------------------------------------------------
>
> Key: SPARK-21263
> URL: https://issues.apache.org/jira/browse/SPARK-21263
> Project: Spark
> Issue Type: Bug
> Components: Java API
> Affects Versions: 2.1.1
> Reporter: Navya Krishnappa
>
> When reading a below-mentioned data by specifying user-defined schema,
> exception is not thrown. Refer the details :
> *Data:*
> 'PatientID','PatientName','TotalBill'
> '1000','Patient1','10u000'
> '1001','Patient2','30000'
> '1002','Patient3','40000'
> '1003','Patient4','50000'
> '1004','Patient5','60000'
> *Source code*:
> Dataset dataset = sparkSession.read().schema(schema)
> .option(INFER_SCHEMA, "true")
> .option(DELIMITER, ",")
> .option(QUOTE, "\"")
> .option(MODE, Mode.PERMISSIVE)
> .csv(sourceFile);
> When we collect the dataset data:
> dataset.collectAsList();
> *Schema1*:
> [StructField(PatientID,IntegerType,true),
> StructField(PatientName,StringType,true),
> StructField(TotalBill,IntegerType,true)]
> *Result *: Throws NumerFormatException
> Caused by: java.lang.NumberFormatException: For input string: "10u000"
> *Schema2*:
> [StructField(PatientID,IntegerType,true),
> StructField(PatientName,StringType,true),
> StructField(TotalBill,DoubleType,true)]
> *Actual Result*:
> "PatientID": 1000,
> "NumberOfVisits": "400",
> "TotalBill": 10,
> *Expected Result*: Should throw NumberFormatException for input string
> "10u000"
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]