Sameer Agarwal created SPARK-16764:
--------------------------------------

             Summary: Recommend disabling vectorized parquet reader on 
OutOfMemoryError
                 Key: SPARK-16764
                 URL: https://issues.apache.org/jira/browse/SPARK-16764
             Project: Spark
          Issue Type: Improvement
            Reporter: Sameer Agarwal


We currently don't bound or manage the data array size used by column vectors 
in the vectorized reader (they're just bound by INT.MAX) which may lead to OOMs 
while reading data. In the short term, we can probably intercept this exception 
and suggest the user to disable the vectorized parquet reader. 
Longer term, we should probably do explicit memory management for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to