Dennis Lawler created SPARK-4808:
------------------------------------
Summary: Spark fails to spill with small number of large objects
Key: SPARK-4808
URL: https://issues.apache.org/jira/browse/SPARK-4808
Project: Spark
Issue Type: Bug
Affects Versions: 1.1.0, 1.0.2, 1.2.0, 1.2.1
Reporter: Dennis Lawler
Spillable's maybeSpill does not allow spill to occur until at least 1000
elements have been spilled, and then will only evaluate spill every 32nd
element thereafter. When there is a small number of very large items being
tracked, out-of-memory conditions may occur.
I suspect that this and the every-32nd-element behavior was to reduce the
impact of the estimateSize() call. This method was extracted into SizeTracker,
which implements its own exponential backup for size estimation, so now we are
only avoiding using the resulting estimated size.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]