[ 
https://issues.apache.org/jira/browse/SPARK-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13967911#comment-13967911
 ] 

Patrick Wendell edited comment on SPARK-1476 at 4/13/14 7:01 PM:
-----------------------------------------------------------------

[~mrid...@yahoo-inc.com] I think the proposed change would benefit from a 
design doc to explain exactly the cases we want to fix and what trade-offs we 
are willing to make in terms of complexity.

Agreed that there is definitely room for improvement in the out-of-the-box 
behavior here.

Right now the limits as I understand them are (a) the shuffle output from one 
mapper to one reducer cannot be more than 2GB. (b) partitions of an RDD cannot 
exceed 2GB.

I see (a) as the bigger of the two issues. It would be helpful to have specific 
examples of workloads where this causes a problem and the associated data 
sizes, etc. For instance, say I want to do a 1 Terabyte shuffle. Right now 
number of (mappers * reducers) needs to be > ~1000 for this to work (e.g. 100 
mappers and 10 reducers) assuming uniform partitioning. That doesn't seem too 
crazy of an assumption, but if you have skew this would be a much bigger 
problem. 

Would it be possible to improve (a) but not (b) with a much simpler design? I'm 
not sure (maybe they reduce to the same problem), but it's something a design 
doc could help flesh out.

Popping up a bit - I think our goal should be to handle reasonable workloads 
and not to be 100% compliant with the semantics of Hadoop MapReduce. After all, 
in-memory RDD's are not even a concept in MapReduce. And keep in mind that 
MapReduce became so bloated/complex of a project that it is today no longer 
possible to make substantial changes to it. That's something we definitely want 
to avoid by keeping Spark internals as simple as possible.


was (Author: pwendell):
[~mrid...@yahoo-inc.com] I think the proposed change would benefit from a 
design doc to explain exactly the cases we want to fix and what trade-offs we 
are willing to make in terms of complexity.

Agreed that there is definitely room for improvement in the out-of-the-box 
behavior here.

Right now the limits as I understand them are (a) the shuffle output from one 
mapper to one reducer cannot be more than 2GB. (b) partitions of an RDD cannot 
exceed 2GB.

I see (a) as the bigger of the two issues. It would be helpful to have specific 
examples of workloads where this causes a problem and the associated data 
sizes, etc. For instance, say I want to do a 1 Terabyte shuffle. Right now 
number of (mappers * reducers) needs to be > ~1000 for this to work (e.g. 100 
mappers and 10 reducers) assuming uniform partitioning. That doesn't seem too 
crazy of an assumption, but if you have skew this would be a much bigger 
problem. 

Popping up a bit - I think our goal should be to handle reasonable workloads 
and not to be 100% compliant with the semantics of Hadoop MapReduce. After all, 
in-memory RDD's are not even a concept in MapReduce. And keep in mind that 
MapReduce became so bloated/complex of a project that it is today no longer 
possible to make substantial changes to it. That's something we definitely want 
to avoid by keeping Spark internals as simple as possible.

> 2GB limit in spark for blocks
> -----------------------------
>
>                 Key: SPARK-1476
>                 URL: https://issues.apache.org/jira/browse/SPARK-1476
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>         Environment: all
>            Reporter: Mridul Muralidharan
>            Priority: Critical
>             Fix For: 1.1.0
>
>
> The underlying abstraction for blocks in spark is a ByteBuffer : which limits 
> the size of the block to 2GB.
> This has implication not just for managed blocks in use, but also for shuffle 
> blocks (memory mapped blocks are limited to 2gig, even though the api allows 
> for long), ser-deser via byte array backed outstreams (SPARK-1391), etc.
> This is a severe limitation for use of spark when used on non trivial 
> datasets.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to