Lakshminarayan Kamath created SPARK-25890:
---------------------------------------------
Summary: Null rows are ignored with Ctrl-A as a delimiter when
reading a CSV file.
Key: SPARK-25890
URL: https://issues.apache.org/jira/browse/SPARK-25890
Project: Spark
Issue Type: Bug
Components: Spark Shell, SQL
Affects Versions: 2.3.2
Reporter: Lakshminarayan Kamath
Reading a Ctrl-A delimited CSV file ignores rows with all null values. However
a comma delimited CSV file doesn't.
*Reproduction in spark-shell:*
import org.apache.spark.sql._
import org.apache.spark.sql.types._
val l = List(List(1, 2), List(null,null), List(2,3))
val datasetSchema = StructType(List(StructField("colA", IntegerType, true),
StructField("colB", IntegerType, true)))
val rdd = sc.parallelize(l).map(item ⇒ Row.fromSeq(item.toSeq))
val df = spark.createDataFrame(rdd, datasetSchema)
df.show()
+----+----+
|colA|colB|
+----+----+
| 1 | 2 |
|null | null|
| 2 | 3 |
+----+----+
df.write.option("delimiter", "\u0001").option("header",
"true").csv("/ctrl-a-separated.csv")
df.write.option("delimiter", ",").option("header",
"true").csv("/comma-separated.csv")
val commaDf = spark.read.option("header", "true").option("delimiter",
",").csv("/comma-separated.csv")
commaDf.show
+----+----+
|colA|colB|
+----+----+
| 1 | 2 |
| 2 | 3 |
|null |null|
+----+----+
val ctrlaDf = spark.read.option("header", "true").option("delimiter",
"\u0001").csv("/ctrl-a-separated.csv")
ctrlaDf.show
+----+----+
|colA|colB|
+----+----+
| 1 | 2 |
| 2 | 3 |
+----+----+
As seen above, for Ctrl-A delimited CSV, rows containing only null values are
ignored.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]