[
https://issues.apache.org/jira/browse/SPARK-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Reynold Xin updated SPARK-2288:
-------------------------------
Assignee: Raymond Liu
> Hide ShuffleBlockManager behind ShuffleManager
> ----------------------------------------------
>
> Key: SPARK-2288
> URL: https://issues.apache.org/jira/browse/SPARK-2288
> Project: Spark
> Issue Type: Sub-task
> Components: Block Manager, Shuffle
> Reporter: Raymond Liu
> Assignee: Raymond Liu
>
> This is a sub task for SPARK-2275.
> At present, In shuffle write path, the shuffle block manager manage the
> mapping from some blockID to a FileSegment for the benefit of consolidate
> shuffle, this way it bypass the block store's blockId based access mode. Then
> in the read path, when read a shuffle block data, disk store query
> shuffleBlockManager to hack the normal blockId to file mapping in order to
> correctly read data from file. This really rend to a lot of bi-directional
> dependencies between modules and the code logic is some how messed up. None
> of the shuffle block manager and blockManager/Disk Store fully control the
> read path. They are tightly coupled in low level code modules. And it make it
> hard to implement other shuffle manager logics. e.g. a sort based shuffle
> which might merge all output from one map partition to a single file. This
> will need to hack more into the diskStore/diskBlockManager etc to find out
> the right data to be read.
> Possible approaching:
> So I think it might be better that we expose an FileSegment based read
> interface for DiskStore in addition to the current blockID based interface.
> Then those mapping blockId to FileSegment code logic can all reside in the
> specific shuffle manager, if they do need to merge data into one single
> object. they take care of the mapping logic in both read/write path and take
> the responsibility of read / write shuffle data
> The BlockStore itself should just take care of read/write as required, it
> should not involve into the data mapping logic at all. This might make the
> interface between modules more clear and decouple each other in a more clean
> way.
--
This message was sent by Atlassian JIRA
(v6.2#6252)