Did you try increasing the perm gen for the driver? Regards Sab On 24-Jun-2015 4:40 pm, "Anders Arpteg" <arp...@spotify.com> wrote:
> When reading large (and many) datasets with the Spark 1.4.0 DataFrames > parquet reader (the org.apache.spark.sql.parquet format), the following > exceptions are thrown: > > Exception in thread "task-result-getter-0" > Exception: java.lang.OutOfMemoryError thrown from the > UncaughtExceptionHandler in thread "task-result-getter-0" > Exception in thread "task-result-getter-3" java.lang.OutOfMemoryError: > PermGen space > Exception in thread "task-result-getter-1" java.lang.OutOfMemoryError: > PermGen space > Exception in thread "task-result-getter-2" java.lang.OutOfMemoryError: > PermGen space > > and many more like these from different threads. I've tried increasing the > PermGen space using the -XX:MaxPermSize VM setting, but even after tripling > the space, the same errors occur. I've also tried storing intermediate > results, and am able to get the full job completed by running it multiple > times and starting for the last successful intermediate result. There seems > to be some memory leak in the parquet format. Any hints on how to fix this > problem? > > Thanks, > Anders >