Re: Parquet problems

Sabarish Sasidharan Wed, 24 Jun 2015 23:03:10 -0700

Did you try increasing the perm gen for the driver?

Regards
Sab
On 24-Jun-2015 4:40 pm, "Anders Arpteg" <arp...@spotify.com> wrote:


> When reading large (and many) datasets with the Spark 1.4.0 DataFrames
> parquet reader (the org.apache.spark.sql.parquet format), the following
> exceptions are thrown:
>
> Exception in thread "task-result-getter-0"
> Exception: java.lang.OutOfMemoryError thrown from the
> UncaughtExceptionHandler in thread "task-result-getter-0"
> Exception in thread "task-result-getter-3" java.lang.OutOfMemoryError:
> PermGen space
> Exception in thread "task-result-getter-1" java.lang.OutOfMemoryError:
> PermGen space
> Exception in thread "task-result-getter-2" java.lang.OutOfMemoryError:
> PermGen space
>
> and many more like these from different threads. I've tried increasing the
> PermGen space using the -XX:MaxPermSize VM setting, but even after tripling
> the space, the same errors occur. I've also tried storing intermediate
> results, and am able to get the full job completed by running it multiple
> times and starting for the last successful intermediate result. There seems
> to be some memory leak in the parquet format. Any hints on how to fix this
> problem?
>
> Thanks,
> Anders
>

Re: Parquet problems

Reply via email to