Sameer Agarwal created SPARK-16764:
--------------------------------------
Summary: Recommend disabling vectorized parquet reader on
OutOfMemoryError
Key: SPARK-16764
URL: https://issues.apache.org/jira/browse/SPARK-16764
Project: Spark
Issue Type: Improvement
Reporter: Sameer Agarwal
We currently don't bound or manage the data array size used by column vectors
in the vectorized reader (they're just bound by INT.MAX) which may lead to OOMs
while reading data. In the short term, we can probably intercept this exception
and suggest the user to disable the vectorized parquet reader.
Longer term, we should probably do explicit memory management for this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]