Sanket Reddy created SPARK-24986:
------------------------------------

             Summary: OOM in BufferHolder during writes to a stream
                 Key: SPARK-24986
                 URL: https://issues.apache.org/jira/browse/SPARK-24986
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.3.0, 2.2.0, 2.1.0
            Reporter: Sanket Reddy


We have seen out of memory exception while running one of our prod jobs. We 
expect the memory allocation to be managed by unified memory manager during run 
time.

So the buffer which is growing during write is somewhat like this if the 
rowlength is constant then the buffer does not grow… it keeps resetting and 
writing the values to  the buffer… if the rows are variable and it is skewed 
and has huge stuff to be written this happens and i think the estimator which 
requests for initial execution memory does not account for this i think… 
Checking for underlying heap before growing the global buffer might be a viable 
option

java.lang.OutOfMemoryError: Java heap space
at 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder.grow(BufferHolder.java:73)
at 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter.initialize(UnsafeArrayWriter.java:61)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply_1$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
 Source)
at 
org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$generateResultProjection$1.apply(AggregationIterator.scala:232)
at 
org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$generateResultProjection$1.apply(AggregationIterator.scala:221)
at 
org.apache.spark.sql.execution.aggregate.SortBasedAggregationIterator.next(SortBasedAggregationIterator.scala:159)
at 
org.apache.spark.sql.execution.aggregate.SortBasedAggregationIterator.next(SortBasedAggregationIterator.scala:29)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:1075)
at scala.collection.Iterator$GroupedIterator.go(Iterator.scala:1091)
at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:1129)
at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:1132)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at 
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:513)
at 
org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:329)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1966)
at 
org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:270)
18/06/11 21:18:41 ERROR SparkUncaughtExceptionHandler: [Container in shutdown] 
Uncaught exception in thread Thread[stdout writer for 
Python/bin/python3.6,5,main]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to