[
https://issues.apache.org/jira/browse/TEZ-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15276517#comment-15276517
]
Hitesh Shah commented on TEZ-2442:
----------------------------------
bq. The deleting happens in two cases: 1) app finishes/unregisters, this is in
handle(event) where the event is DAG_FINISHED; 2) AM shutdown. This is the
second case.
That is not exactly accurate. A single Tez AM can run multiple DAGs. A
particular AM can stop/shutdown in the middle of running of a DAG, the next AM
attempt will recover and continue running the DAG from previous attempt. This
means that dag specific data can be cleaned up on dag finished but overall app
data can only be cleaned up on session shutdown i.e. after the AM unregisters
with the RM ( check how staging dir is deleted ).
bq. based on the path we can tell if this is local vs fs based. for fs based
the inputFile is fully qualified FileSystem URL.
Agreed but the toString() function explicitly states "LocalDiskFetchedInput"
bq. are you worried about 100K sub folders in one HDFS folder? I'm going to do
a scaling test and share the results later.
Traversing a dir with a 100K entries will likely be slow but we can avoid some
basic pitfalls by optimizing the dir layout better i.e. use a bit more
heirarchy instead of a flat structure if possible.
bq. Just providing another way to supply shuffle location. If we use an AUX
service in NM to do proper clean up, we can let NM generate the shuffle
location and properly clean it up. NM can provide the shuffle location via
environment.
The main problem here is that the user can override the env via tez.task.env so
we have no safe way to infer whether the NM gave the info or not. For now, it
might be better to remove this. Additionally, the aux service could be
implemented so as to allow the app to tell it of a location to monitor and
clean up after app completion instead of the aux service telling the app of
where to write data. This can be done via serviceData in the
ContainerLaunchContext.
> Support DFS based shuffle in addition to HTTP shuffle
> -----------------------------------------------------
>
> Key: TEZ-2442
> URL: https://issues.apache.org/jira/browse/TEZ-2442
> Project: Apache Tez
> Issue Type: Improvement
> Affects Versions: 0.5.3
> Reporter: Kannan Rajah
> Assignee: shanyu zhao
> Attachments: FS_based_shuffle_v2.pdf, Tez Shuffle using DFS.pdf,
> hdfs_broadcast_hack.txt, tez-2442-trunk.2.patch, tez-2442-trunk.3.patch,
> tez-2442-trunk.4.patch, tez-2442-trunk.patch, tez_hdfs_shuffle.patch
>
>
> In Tez, Shuffle is a mechanism by which intermediate data can be shared
> between stages. Shuffle data is written to local disk and fetched from any
> remote node using HTTP. A DFS like MapR file system can support writing this
> shuffle data directly to its DFS using a notion of local volumes and retrieve
> it using HDFS API from remote node. The current Shuffle implementation
> assumes local data can only be managed by LocalFileSystem. So it uses
> RawLocalFileSystem and LocalDirAllocator. If we can remove this assumption
> and introduce an abstraction to manage local disks, then we can reuse most of
> the shuffle logic (store, sort) and inject a HDFS API based retrieval instead
> of HTTP.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)