[ 
https://issues.apache.org/jira/browse/TEZ-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15276517#comment-15276517
 ] 

Hitesh Shah commented on TEZ-2442:
----------------------------------

bq. The deleting happens in two cases: 1) app finishes/unregisters, this is in 
handle(event) where the event is DAG_FINISHED; 2) AM shutdown. This is the 
second case.

That is not exactly accurate. A single Tez AM can run multiple DAGs. A 
particular AM can stop/shutdown in the middle of running of a DAG, the next AM 
attempt will recover and continue running the DAG from previous attempt. This 
means that dag specific data can be cleaned up on dag finished but overall app 
data can only be cleaned up on session shutdown i.e. after the AM unregisters 
with the RM ( check how staging dir is deleted ). 

bq. based on the path we can tell if this is local vs fs based. for fs based 
the inputFile is fully qualified FileSystem URL.
Agreed but the toString() function explicitly states "LocalDiskFetchedInput"

bq. are you worried about 100K sub folders in one HDFS folder? I'm going to do 
a scaling test and share the results later.
Traversing a dir with a 100K entries will likely be slow but we can avoid some 
basic pitfalls by optimizing the dir layout better i.e. use a bit more 
heirarchy instead of a flat structure if possible. 

bq. Just providing another way to supply shuffle location. If we use an AUX 
service in NM to do proper clean up, we can let NM generate the shuffle 
location and properly clean it up. NM can provide the shuffle location via 
environment.
The main problem here is that the user can override the env via tez.task.env so 
we have no safe way to infer whether the NM gave the info or not. For now, it 
might be better to remove this. Additionally, the aux service could be 
implemented so as to allow the app to tell it of a location to monitor and 
clean up after app completion instead of the aux service telling the app of 
where to write data. This can be done via serviceData in the 
ContainerLaunchContext. 




> Support DFS based shuffle in addition to HTTP shuffle
> -----------------------------------------------------
>
>                 Key: TEZ-2442
>                 URL: https://issues.apache.org/jira/browse/TEZ-2442
>             Project: Apache Tez
>          Issue Type: Improvement
>    Affects Versions: 0.5.3
>            Reporter: Kannan Rajah
>            Assignee: shanyu zhao
>         Attachments: FS_based_shuffle_v2.pdf, Tez Shuffle using DFS.pdf, 
> hdfs_broadcast_hack.txt, tez-2442-trunk.2.patch, tez-2442-trunk.3.patch, 
> tez-2442-trunk.4.patch, tez-2442-trunk.patch, tez_hdfs_shuffle.patch
>
>
> In Tez, Shuffle is a mechanism by which intermediate data can be shared 
> between stages. Shuffle data is written to local disk and fetched from any 
> remote node using HTTP. A DFS like MapR file system can support writing this 
> shuffle data directly to its DFS using a notion of local volumes and retrieve 
> it using HDFS API from remote node. The current Shuffle implementation 
> assumes local data can only be managed by LocalFileSystem. So it uses 
> RawLocalFileSystem and LocalDirAllocator. If we can remove this assumption 
> and introduce an abstraction to manage local disks, then we can reuse most of 
> the shuffle logic (store, sort) and inject a HDFS API based retrieval instead 
> of HTTP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to