Yifeng Li created SPARK-19914:
---------------------------------
Summary: Spark Scala - Calling persist after reading a parquet
file makes certain spark.sql queries return empty results
Key: SPARK-19914
URL: https://issues.apache.org/jira/browse/SPARK-19914
Project: Spark
Issue Type: Bug
Components: Input/Output, SQL
Affects Versions: 2.1.0, 2.0.0
Reporter: Yifeng Li
Hi There,
It seems like calling .persist() after spark.read.parquet will make spark.sql
statements return empty results if the query is written in a certain way.
I have the following code here:
val df = spark.read.parquet("C:\\...")
df.createOrReplaceTempView("t1")
spark.sql("select * from t1 a where a.id = '123456789'").show(10)
Everything works fine here.
Now, if I do:
val df = spark.read.parquet("C:\\...").persist(StorageLevel.DISK_ONLY)
df.createOrReplaceTempView("t1")
spark.sql("select * from t1 a where a.id = '123456789'").show(10)
I will get empty results.
selecting individual columns works with persist, e.g.:
val df = spark.read.parquet("C:\\...").persist(StorageLevel.DISK_ONLY)
df.createOrReplaceTempView("t1")
spark.sql("select a.id from t1 a where a.id = '123456789'").show(10)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]