Thanks for the suggestions and links. The problem arises when I used
DataFrame api to write but it works fine when doing insert overwrite in
hive table.
# Works good
hive_context.sql("insert overwrite table {0} partiton (e_dt, c_dt) select *
from temp_table".format(table_name))
# Doesn't work,
If you are running on 64-bit JVM with less than 32G heap, you might want to
enable -XX:+UseCompressedOops[1]. And if your dataframe is somehow
generating more than 2^31-1 number of arrays, you might have to rethink
your options.
[1] https://spark.apache.org/docs/latest/tuning.html
On Wed, May 4,
Have you seen this thread ?
http://search-hadoop.com/m/q3RTtyXr2N13hf9O=java+lang+OutOfMemoryError+Requested+array+size+exceeds+VM+limit
On Wed, May 4, 2016 at 2:44 PM, Bijay Kumar Pathak wrote:
> Hi,
>
> I am reading the parquet file around 50+ G which has 4013 partitions
Hi,
I am reading the parquet file around 50+ G which has 4013 partitions with
240 columns. Below is my configuration
driver : 20G memory with 4 cores
executors: 45 executors with 15G memory and 4 cores.
I tried to read the data using both Dataframe read and using hive context
to read the data