spark version: spark-1.5.2-bin-hadoop2.6 python version: 2.7.9 os: ubuntu 14.04
code to reproduce error # write.py import pyspark sc = pyspark.SparkContext() sqlc = pyspark.SQLContext(sc) df = sqlc.range(10) df1 = df.withColumn('a', df['id'] * 2) df1.write.partitionBy('id').parquet('./data') # read.py import pyspark sc = pyspark.SparkContext() sqlc = pyspark.SQLContext(sc) df2 = sqlc.read.parquet('./data') df2.dropna().count() $ spark-submit write.py $ spark-submit read.py # error message 15/12/08 17:20:34 ERROR Filter: Failed to generate predicate, fallback to interpreted org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute, tree: a#0L ... If write data without partitionBy, the error won't happen any suggestion? Thanks! -- -- 張雅軒