[ 
https://issues.apache.org/jira/browse/SPARK-12831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brett Stime updated SPARK-12831:
--------------------------------
    Description: 
Getting the following error in my executor logs:

ERROR akka.ErrorMonitor: Transient association error (association remains live)
akka.remote.OversizedPayloadException: Discarding oversized payload sent to 
Actor[akka.tcp://sparkDriver@172.21.25.199:51562/user/CoarseGrainedScheduler#-2039547722]:
 max allowed size 134217728 bytes, actual size of encoded class 
org.apache.spark.rpc.akka.AkkaMessage was 134419636 bytes.

Seems like the quick fix would be to make AkkaUtils.reservedSizeBytes a little 
bigger--maybe proportional to spark.akka.frameSize and/or user configurable.

A more robust solution might be to catch OversizedPayloadException and retry 
using the BlockManager.

I should also mention that this has the effect of stalling the entire job (my 
use case also requires fairly liberal timeouts). For now, I'll see if setting 
spark.akka.frameSize a little smaller gives me more proportional overhead.

Thanks.

  was:
Getting the following error in my executor logs:

ERROR akka.ErrorMonitor: Transient association error (association remains live)
akka.remote.OversizedPayloadException: Discarding oversized payload sent to 
Actor[akka.tcp://sparkDriver@172.21.25.199:51562/user/CoarseGrainedScheduler#-2039547722]:
 max allowed size 134217728 bytes, actual size of encoded class 
org.apache.spark.rpc.akka.AkkaMessage was 134419636 bytes.

Seems like the quick fix would be to make AkkaUtils.reservedSizeBytes a little 
bigger--maybe proportional to spark.akka.frameSize and/or user configurable.

A more robust solution might be to catch OversizedPayloadException and retry 
using the BlockManager.

For now, I'll see if setting spark.akka.frameSize a little smaller gives me 
more overhead.

Thanks.


> akka.remote.OversizedPayloadException on DirectTaskResult
> ---------------------------------------------------------
>
>                 Key: SPARK-12831
>                 URL: https://issues.apache.org/jira/browse/SPARK-12831
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Brett Stime
>
> Getting the following error in my executor logs:
> ERROR akka.ErrorMonitor: Transient association error (association remains 
> live)
> akka.remote.OversizedPayloadException: Discarding oversized payload sent to 
> Actor[akka.tcp://sparkDriver@172.21.25.199:51562/user/CoarseGrainedScheduler#-2039547722]:
>  max allowed size 134217728 bytes, actual size of encoded class 
> org.apache.spark.rpc.akka.AkkaMessage was 134419636 bytes.
> Seems like the quick fix would be to make AkkaUtils.reservedSizeBytes a 
> little bigger--maybe proportional to spark.akka.frameSize and/or user 
> configurable.
> A more robust solution might be to catch OversizedPayloadException and retry 
> using the BlockManager.
> I should also mention that this has the effect of stalling the entire job (my 
> use case also requires fairly liberal timeouts). For now, I'll see if setting 
> spark.akka.frameSize a little smaller gives me more proportional overhead.
> Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to