[
https://issues.apache.org/jira/browse/SPARK-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13967947#comment-13967947
]
Matei Zaharia commented on SPARK-1476:
--------------------------------------
Hey Mridul, the one thing I'd add as an alternative is whether we could have
splitting happen at a higher level than the block manager. For example, maybe a
map task is allowed to create 2 output blocks for a given reducer, or maybe a
cached RDD partition gets stored as 2 blocks. This might be slightly easier to
implement than replacing all instances of ByteBuffers. But I agree that this
should be addressed somehow, since 2 GB will become more and more limiting over
time. Anyway, I'd love to see a more detailed design. I think even the
replace-ByteBuffers approach you proposed can be made to work with Tachyon.
> 2GB limit in spark for blocks
> -----------------------------
>
> Key: SPARK-1476
> URL: https://issues.apache.org/jira/browse/SPARK-1476
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Environment: all
> Reporter: Mridul Muralidharan
> Priority: Critical
> Fix For: 1.1.0
>
>
> The underlying abstraction for blocks in spark is a ByteBuffer : which limits
> the size of the block to 2GB.
> This has implication not just for managed blocks in use, but also for shuffle
> blocks (memory mapped blocks are limited to 2gig, even though the api allows
> for long), ser-deser via byte array backed outstreams (SPARK-1391), etc.
> This is a severe limitation for use of spark when used on non trivial
> datasets.
--
This message was sent by Atlassian JIRA
(v6.2#6252)