Pyspark SQL 1.6.0 write problem

Ethan Aubin Thu, 25 Aug 2016 08:00:49 -0700

Hi, I'm having problems writing dataframes with pyspark 1.6.0.  If I create
a small dataframe like:


    sqlContext.createDataFrame(pandas.DataFrame.from_dict([{'x':
1}])).write.orc('test-orc')

Only the _SUCCESS file in the output directory is written. The executor log
shows the saved output of the task being written under test-orc/_temporary/.

Writing with parquet rather than orc, I have the same output (a _SUCCESS
file, no parts), but there's also an exception

java.lang.NullPointerException
    at org.apache.parquet.hadoop.ParquetFileWriter.mergeFooters(Par
quetFileWriter.java:456)

matching "Writing empty Dataframes doesn't save any _metadata files"
https://issues.apache.org/jira/browse/SPARK-15393

If I do the equivalent in Scala, things work as expected. Any suggestions
what could be happening? Much appreciated --Ethan

Pyspark SQL 1.6.0 write problem

Reply via email to