Raymond Liu created SPARK-2275:
----------------------------------

             Summary: More general Storage Interface for Shuffle / Spill etc.
                 Key: SPARK-2275
                 URL: https://issues.apache.org/jira/browse/SPARK-2275
             Project: Spark
          Issue Type: Improvement
          Components: Block Manager, Shuffle
            Reporter: Raymond Liu


Problem 1:

In the current design, when shuffle / spill is involved, A File based interface 
is assumed for various classes. While this might not been true when we have 
disk store implemented base on something else other than FileSystem ( e.g. an 
kv/object based NVM device) . And also if we try to utilize memory or off heap 
store for shuffle, the File interface also do not work.

Possible approaching : 

So my general idea here is to hide the File Interface, instead using a general 
ObjectId to represent the object been written to external store, And pass 
around this ObjectId to various class to access the data. 

e.g.  

For Write path: DiskBlockObjectWritter( could be rename to 
FileBlockObjectWritter) now take in ObjectId instead of File and do mapping to 
File internally

For Read path , A InputStream Interface is suppose to be able to retrieved from 
the ObjectId by specific store, Thus  various read operation do not need to 
rely on the assumption that the lower level storage is a filesystem and rely on 
File to build their own FileInputStream etc.

In this way, the current File base diskStore could still using File to 
implement it's internal  storage, while other solution could be easily been 
plug in with other low level implementation and just mapping to the ObjectId 
for other module to interact with.


Problem 2 :

At present, In shuffle write path, the shuffle block manager manage the mapping 
from some blockID to a FileSegment for the benefit of consolidate shuffle, this 
way it bypass the block store's blockId based access mode. Then in the read 
path, when read a shuffle block data, disk store query shuffleBlockManager to 
hack the normal blockId to file mapping in order to correctly read data from 
file. This really rend to a lot of bi-directional dependencies between modules 
and the code logic is some how messed up. None of the shuffle block manager and 
blockManager/Disk Store fully control the read path. They are tightly coupled 
in low level code modules. And it make it hard to implement other shuffle 
manager logics. e.g. a sort based shuffle which might merge all output from one 
map partition to a single file. This will need to hack more into the 
diskStore/diskBlockManager etc to find out the right data to be read.

Possible approaching:

So I think it might be better that we expose an object + offset based read 
interface  for BlockStore, ( or at least for DiskStore), e.g.  a 
getObjectData(objectId, offset, length)  in addition to the current blockID 
based interface.

Then those mapping blockId to object and offset code logic can all reside in 
the specific shuffle manager, if they do need to merge data into one single 
object(File here in current diskStore implementation) they take care of the 
mapping logic in both read/write path and take the responsibility of read / 
write shuffle data ( since they already take care of write data,  then read 
data also go through them instead of go through blockmanager is also 
reasonable, they can further use the object+offset based read interface for 
actual read work )

The BlockStore itself should just take care of read/write as required, it 
should not involve into the data mapping logic at all. This might make the 
interface between modules more clear and decouple each other in a more clean 
way. 




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to