Raymond Liu created SPARK-2288:
----------------------------------

             Summary: Hide ShuffleBlockManager behind ShuffleManager
                 Key: SPARK-2288
                 URL: https://issues.apache.org/jira/browse/SPARK-2288
             Project: Spark
          Issue Type: Improvement
          Components: Block Manager, Shuffle
            Reporter: Raymond Liu


This is a sub task for SPARK-2275. 

At present, In shuffle write path, the shuffle block manager manage the mapping 
from some blockID to a FileSegment for the benefit of consolidate shuffle, this 
way it bypass the block store's blockId based access mode. Then in the read 
path, when read a shuffle block data, disk store query shuffleBlockManager to 
hack the normal blockId to file mapping in order to correctly read data from 
file. This really rend to a lot of bi-directional dependencies between modules 
and the code logic is some how messed up. None of the shuffle block manager and 
blockManager/Disk Store fully control the read path. They are tightly coupled 
in low level code modules. And it make it hard to implement other shuffle 
manager logics. e.g. a sort based shuffle which might merge all output from one 
map partition to a single file. This will need to hack more into the 
diskStore/diskBlockManager etc to find out the right data to be read.

Possible approaching:
So I think it might be better that we expose an FileSegment based read 
interface for DiskStore in addition to the current blockID based interface.
Then those mapping blockId to FileSegment code logic can all reside in the 
specific shuffle manager, if they do need to merge data into one single object. 
they take care of the mapping logic in both read/write path and take the 
responsibility of read / write shuffle data
The BlockStore itself should just take care of read/write as required, it 
should not involve into the data mapping logic at all. This might make the 
interface between modules more clear and decouple each other in a more clean 
way.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to