yahsuan, chang created SPARK-12231:
--------------------------------------
Summary: Failed to generate predicate Error when using dropna
Key: SPARK-12231
URL: https://issues.apache.org/jira/browse/SPARK-12231
Project: Spark
Issue Type: Bug
Components: PySpark, SQL
Affects Versions: 1.5.2
Environment: python version: 2.7.9
os: ubuntu 14.04
Reporter: yahsuan, chang
code to reproduce error
# write.py
import pyspark
sc = pyspark.SparkContext()
sqlc = pyspark.SQLContext(sc)
df = sqlc.range(10)
df1 = df.withColumn('a', df['id'] * 2)
df1.write.partitionBy('id').parquet('./data')
# read.py
import pyspark
sc = pyspark.SparkContext()
sqlc = pyspark.SQLContext(sc)
df2 = sqlc.read.parquet('./data')
df2.dropna().count()
$ spark-submit write.py
$ spark-submit read.py
# error message
15/12/08 17:20:34 ERROR Filter: Failed to generate predicate, fallback to
interpreted org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
Binding attribute, tree: a#0L
...
If write data without partitionBy, the error won't happen
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]