Kannan Rajah created TEZ-2442:
---------------------------------

             Summary: Support DFS based shuffle in addition to HTTP shuffle
                 Key: TEZ-2442
                 URL: https://issues.apache.org/jira/browse/TEZ-2442
             Project: Apache Tez
          Issue Type: Improvement
    Affects Versions: 0.5.3
            Reporter: Kannan Rajah


In Tez, Shuffle is a mechanism by which intermediate data can be shared between 
stages. Shuffle data is written to local disk and fetched from any remote node 
using HTTP. A DFS like MapR file system can support writing this shuffle data 
directly to its DFS using a notion of local volumes and retrieve it using HDFS 
API from remote node. The current Shuffle implementation assumes local data can 
only be managed by LocalFileSystem. So it uses RawLocalFileSystem and 
LocalDirAllocator. If we can remove this assumption and introduce an 
abstraction to manage local disks, then we can reuse most of the shuffle logic 
(store, sort) and inject a HDFS API based retrieval instead of HTTP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to