Thomas Graves commented on SPARK-23964:

I'm not sure, I'm trying to figure out if there is a performance implications 
here and perhaps there are but its at the cost of not being accurate  on memory 
usage.  In the deployments with fixed sized containers this is very important.  
if you wait 32 elements it may cause you to acquire a bigger chunk of memory at 
once vs getting smaller allocations (thus more).

I would think the only check you need is: currentMemory >= myMemoryThreshold, 
the initial threshold is 5MB right now but all its doing is asking for more 
memory, only when it can't get memory does it spill.  And the initial threshold 
is configurable so you can always make it bigger. 

I'm going to try to do some performance tests to see what happens but would 
like to know if anyone has other background.  

> why does Spillable wait for 32 elements?
> ----------------------------------------
>                 Key: SPARK-23964
>                 URL: https://issues.apache.org/jira/browse/SPARK-23964
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.2.1
>            Reporter: Thomas Graves
>            Priority: Major
> The spillable class has a check in maybeSpill as to when it tries to acquire 
> more memory and determine if it should spill:
> if (elementsRead % 32 == 0 && currentMemory >= myMemoryThreshold) {
> Before it looks to see if it should spill.  
> [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/collection/Spillable.scala#L83]
> I'm wondering why it has the elementsRead %32  in it?  If I have a small 
> number of elements that are huge this can easily cause OOM before we actually 
> spill.  
> I saw a few conversations on this and one Jira related: 
> https://issues.apache.org/jira/browse/SPARK-4456 . but I've never seen an 
> answer to this.
> anyone have history on this?

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to