[ https://issues.apache.org/jira/browse/SPARK-25890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675110#comment-16675110 ]
Maxim Gekk commented on SPARK-25890: ------------------------------------ I have double checked on branch-2.4. It doesn't have the problem too. > Null rows are ignored with Ctrl-A as a delimiter when reading a CSV file. > ------------------------------------------------------------------------- > > Key: SPARK-25890 > URL: https://issues.apache.org/jira/browse/SPARK-25890 > Project: Spark > Issue Type: Bug > Components: Spark Shell, SQL > Affects Versions: 2.3.2 > Reporter: Lakshminarayan Kamath > Priority: Major > > Reading a Ctrl-A delimited CSV file ignores rows with all null values. > However a comma delimited CSV file doesn't. > *Reproduction in spark-shell:* > import org.apache.spark.sql._ > import org.apache.spark.sql.types._ > val l = List(List(1, 2), List(null,null), List(2,3)) > val datasetSchema = StructType(List(StructField("colA", IntegerType, true), > StructField("colB", IntegerType, true))) > val rdd = sc.parallelize(l).map(item ⇒ Row.fromSeq(item.toSeq)) > val df = spark.createDataFrame(rdd, datasetSchema) > df.show() > |colA|colB| > |1 |2 | > |null|null| > |2 |3 | | > df.write.option("delimiter", "\u0001").option("header", > "true").csv("/ctrl-a-separated.csv") > df.write.option("delimiter", ",").option("header", > "true").csv("/comma-separated.csv") > val commaDf = spark.read.option("header", "true").option("delimiter", > ",").csv("/comma-separated.csv") > commaDf.show > |colA|colB| > |1 |2 | > |2 |3 | > |null|null| > val ctrlaDf = spark.read.option("header", "true").option("delimiter", > "\u0001").csv("/ctrl-a-separated.csv") > ctrlaDf.show > |colA|colB| > |1 |2 | > |2 |3 | > > As seen above, for Ctrl-A delimited CSV, rows containing only null values are > ignored. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org