[
https://issues.apache.org/jira/browse/TEZ-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15277284#comment-15277284
]
shanyu zhao commented on TEZ-2442:
----------------------------------
bq. That is not exactly accurate. A single Tez AM can run multiple DAGs. A
particular AM can stop/shutdown in the middle of running of a DAG, the next AM
attempt will recover and continue running the DAG from previous attempt. This
means that dag specific data can be cleaned up on dag finished but overall app
data can only be cleaned up on session shutdown i.e. after the AM unregisters
with the RM ( check how staging dir is deleted ).
I see. So I'll just remove the code to cleaning the intermediate data folder in
FsBasedShuffleDataCleaner.serviceStop(), because there is no app level data.
All data is at DAG level.
bq. Traversing a dir with a 100K entries will likely be slow but we can avoid
some basic pitfalls by optimizing the dir layout better i.e. use a bit more
heirarchy instead of a flat structure if possible.
Ok, I'll use a hash code as the sub folder that holds the attempt ID folder.
Something like this:
{code}
String attemptId = context.getUniqueIdentifier();
String hashCode = String.format("%08d", (attemptId.hashCode() %
NUM_DAG_SUB_FOLDERS));
String subPath = context.getApplicationId().toString() + Path.SEPARATOR
+ context.getDagIdentifier() + Path.SEPARATOR + hashCode +
Path.SEPARATOR + attemptId;
{code}
bq. The main problem here is that the user can override the env via
tez.task.env so we have no safe way to infer whether the NM gave the info or
not. For now, it might be better to remove this. Additionally, the aux service
could be implemented so as to allow the app to tell it of a location to monitor
and clean up after app completion instead of the aux service telling the app of
where to write data. This can be done via serviceData in the
ContainerLaunchContext.
I see your point. The user at the client side can override the env, but why
would they do it if they could simply using tez.fs.based.shuffle.location? In
the end, the NM code decides on what env values to pass inside the child
process, so NM can finally override this env variable if it needs to generate
the intermediate data path.
In our use case, we use our specific temp file system implementation to hold
the intermediate data, which has to be generated by the NM, so we need a way to
override the location in NM. Do you have a better suggestion?
> Support DFS based shuffle in addition to HTTP shuffle
> -----------------------------------------------------
>
> Key: TEZ-2442
> URL: https://issues.apache.org/jira/browse/TEZ-2442
> Project: Apache Tez
> Issue Type: Improvement
> Affects Versions: 0.5.3
> Reporter: Kannan Rajah
> Assignee: shanyu zhao
> Attachments: FS_based_shuffle_v2.pdf, Tez Shuffle using DFS.pdf,
> hdfs_broadcast_hack.txt, tez-2442-trunk.2.patch, tez-2442-trunk.3.patch,
> tez-2442-trunk.4.patch, tez-2442-trunk.patch, tez_hdfs_shuffle.patch
>
>
> In Tez, Shuffle is a mechanism by which intermediate data can be shared
> between stages. Shuffle data is written to local disk and fetched from any
> remote node using HTTP. A DFS like MapR file system can support writing this
> shuffle data directly to its DFS using a notion of local volumes and retrieve
> it using HDFS API from remote node. The current Shuffle implementation
> assumes local data can only be managed by LocalFileSystem. So it uses
> RawLocalFileSystem and LocalDirAllocator. If we can remove this assumption
> and introduce an abstraction to manage local disks, then we can reuse most of
> the shuffle logic (store, sort) and inject a HDFS API based retrieval instead
> of HTTP.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)