Hi James, You can try to write with other format, e.g., parquet to see whether it is a orc specific issue or more generic issue.
Thanks. Zhan Zhang On Feb 23, 2016, at 6:05 AM, James Barney <jamesbarne...@gmail.com<mailto:jamesbarne...@gmail.com>> wrote: I'm trying to write an ORC file after running the FPGrowth algorithm on a dataset of around just 2GB in size. The algorithm performs well and can display results if I take(n) the freqItemSets() of the result after converting that to a DF. I'm using Spark 1.5.2 on HDP 2.3.4 and Python 3.4.2 on Yarn. I get the results from querying a Hive table, also ORC format, running a number of maps, joins, and filters on the data. When the program attempts to write the files: result.write.orc('/data/staged/raw_result') size_1_buckets.write.orc('/data/staged/size_1_results') filter_size_2_buckets.write.orc('/data/staged/size_2_results') The first path, /data/staged/raw_result, is created with a _temporary folder, but the data is never written. The job hangs at this point, apparently indefinitely. Additionally, no logs are recorded or available for the jobs on the history server. What could be the problem?