[ https://issues.apache.org/jira/browse/SPARK-25890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675088#comment-16675088 ]
Maxim Gekk commented on SPARK-25890: ------------------------------------ I got the following on the commit *4afb35*: {code:scala} scala> val ctrlaDf = spark.read.option("header", "true").option("delimiter", "\u0001").csv("ctrl-a-separated.csv") ctrlaDf: org.apache.spark.sql.DataFrame = [colA: string, colB: string] scala> ctrlaDf.show +----+----+ |colA|colB| +----+----+ |null|null| | 2| 3| | 1| 2| +----+----+ {code} The ctrl-a-separated.csv file contains \u0001 as the delimiter: {code} hexdump -C ctrl-a-separated.csv/part-00002-b13ced94-e5d1-406b-afd8-565acd649261-c000.csv 00000000 63 6f 6c 41 01 63 6f 6c 42 0a 31 01 32 0a |colA.colB.1.2.| 0000000e {code} > Null rows are ignored with Ctrl-A as a delimiter when reading a CSV file. > ------------------------------------------------------------------------- > > Key: SPARK-25890 > URL: https://issues.apache.org/jira/browse/SPARK-25890 > Project: Spark > Issue Type: Bug > Components: Spark Shell, SQL > Affects Versions: 2.3.2 > Reporter: Lakshminarayan Kamath > Priority: Major > > Reading a Ctrl-A delimited CSV file ignores rows with all null values. > However a comma delimited CSV file doesn't. > *Reproduction in spark-shell:* > import org.apache.spark.sql._ > import org.apache.spark.sql.types._ > val l = List(List(1, 2), List(null,null), List(2,3)) > val datasetSchema = StructType(List(StructField("colA", IntegerType, true), > StructField("colB", IntegerType, true))) > val rdd = sc.parallelize(l).map(item ⇒ Row.fromSeq(item.toSeq)) > val df = spark.createDataFrame(rdd, datasetSchema) > df.show() > |colA|colB| > |1 |2 | > |null|null| > |2 |3 | | > df.write.option("delimiter", "\u0001").option("header", > "true").csv("/ctrl-a-separated.csv") > df.write.option("delimiter", ",").option("header", > "true").csv("/comma-separated.csv") > val commaDf = spark.read.option("header", "true").option("delimiter", > ",").csv("/comma-separated.csv") > commaDf.show > |colA|colB| > |1 |2 | > |2 |3 | > |null|null| > val ctrlaDf = spark.read.option("header", "true").option("delimiter", > "\u0001").csv("/ctrl-a-separated.csv") > ctrlaDf.show > |colA|colB| > |1 |2 | > |2 |3 | > > As seen above, for Ctrl-A delimited CSV, rows containing only null values are > ignored. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org