Re: Spark-xml - OutOfMemoryError: Requested array size exceeds VM limit

2016-11-16 Thread Hyukjin Kwon
It seems a bit weird. Could we open an issue and talk in the repository link I sent? Let me try to reproduce your case with your data if possible. On 17 Nov 2016 2:26 a.m., "Arun Patel" wrote: > I tried below options. > > 1) Increase executor memory. Increased up to

Re: Spark-xml - OutOfMemoryError: Requested array size exceeds VM limit

2016-11-16 Thread Arun Patel
I tried below options. 1) Increase executor memory. Increased up to maximum possibility 14GB. Same error. 2) Tried new version - spark-xml_2.10:0.4.1. Same error. 3) Tried with low level rowTags. It worked for lower level rowTag and returned 16000 rows. Are there any workarounds for this

Re: Spark-xml - OutOfMemoryError: Requested array size exceeds VM limit

2016-11-15 Thread Arun Patel
Thanks for the quick response. Its a single XML file and I am using a top level rowTag. So, it creates only one row in a Dataframe with 5 columns. One of these columns will contain most of the data as StructType. Is there a limitation to store data in a cell of a Dataframe? I will check with

Re: Spark-xml - OutOfMemoryError: Requested array size exceeds VM limit

2016-11-15 Thread Hyukjin Kwon
Hi Arun, I have few questions. Dose your XML file have like few huge documents? In this case of a row having a huge size like (like 500MB), it would consume a lot of memory becuase at least it should hold a row to iterate if I remember correctly. I remember this happened to me before while

Spark-xml - OutOfMemoryError: Requested array size exceeds VM limit

2016-11-15 Thread Arun Patel
I am trying to read an XML file which is 1GB is size. I am getting an error 'java.lang.OutOfMemoryError: Requested array size exceeds VM limit' after reading 7 partitions in local mode. In Yarn mode, it throws 'java.lang.OutOfMemoryError: Java heap space' error after reading 3 partitions. Any