[ https://issues.apache.org/jira/browse/SPARK-30687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17029720#comment-17029720 ]
pavithra ramachandran commented on SPARK-30687: ----------------------------------------------- yes. Issue is present 2.4.x also. > When reading from a file with pre-defined schema and encountering a single > value that is not the same type as that of its column , Spark nullifies the > entire row > ----------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-30687 > URL: https://issues.apache.org/jira/browse/SPARK-30687 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.0 > Reporter: Bao Nguyen > Priority: Major > > When reading from a file with pre-defined schema and encountering a single > value that is not the same type as that of its column , Spark nullifies the > entire row instead of setting the value at that cell to be null. > > {code:java} > case class TestModel( > num: Double, test: String, mac: String, value: Double > ) > val schema = > ScalaReflection.schemaFor[TestModel].dataType.asInstanceOf[StructType] > //here's the content of the file test.data > // 1~test~mac1~2 > // 1.0~testdatarow2~mac2~non-numeric > // 2~test1~mac1~3 > val ds = spark > .read > .schema(schema) > .option("delimiter", "~") > .csv("/test-data/test.data") > ds.show(); > //the content of data frame. second row is all null. > // +----+-----+----+-----+ > // | num| test| mac|value| > // +----+-----+----+-----+ > // | 1.0| test|mac1| 2.0| > // |null| null|null| null| > // | 2.0|test1|mac1| 3.0| > // +----+-----+----+-----+ > //should be > // +----+--------------+----+-----+ > // | num| test | mac|value| > // +----+--------------+----+-----+ > // | 1.0| test |mac1| 2.0 | > // |1.0 |testdatarow2 |mac2| null| > // | 2.0|test1 |mac1| 3.0 | > // +----+--------------+----+-----+{code} > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org