Peter Rose created SPARK-18558:
----------------------------------

             Summary: spark-csv: infer data type for mixed integer/null columns 
causes exception
                 Key: SPARK-18558
                 URL: https://issues.apache.org/jira/browse/SPARK-18558
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.0.2
            Reporter: Peter Rose


Null pointer exception when using the following csv file:

example.csv:
column1
"1"
"2"
""

 Dataset<Row> df = spark
                        .read()
                        .option("header", "true")
                        .option("inferSchema", "true")
                        .format("csv")
                        .load(example.csv);

 df.printSchema();

The type is correctly inferred:

root
 |-- col1: integer (nullable = true)

df.show(5);

The show method leads to this exception:

java.lang.NumberFormatException: null
        at java.lang.Integer.parseInt(Integer.java:542) ~[?:1.8.0_25]
        at java.lang.Integer.parseInt(Integer.java:615) ~[?:1.8.0_25]
        at 
scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272) 
~[scala-library-2.11.8.jar:?]
        at scala.collection.immutable.StringOps.toInt(StringOps.scala:29) 
~[scala-library-2.11.8.jar:?]
        at 
org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:241)
 ~[spark-sql_2.11-2.0.2.jar:2.0.2]






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to