[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336069#comment-16336069
 ] 

Steve Loughran commented on MAPREDUCE-7029:
-------------------------------------------

bq. I don't think I could have delved into the code without some help from 
teammate

second piece of corerecusive code I've actually encountered in the real world; 
this one appears to have evolved that way rather than designed. It works, its 
just we fear changing it...which is why the S3a stuff added a new 
plugin/factory point: making changes to the FO committer ran too much risk of 
breaking everything else.

yeah, I'm planning to add fault injection to my committer just to see how 
things handle failures halfway through commits, in cleanup, etc. 

You might find [Committer 
Architecture|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committer_architecture.md]
 useful, though there's an error in one of the code snippets which HADOOP-15107 
patches

> FileOutputCommitter is slow on filesystems lacking recursive delete
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-7029
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7029
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 2.8.2
>         Environment: - Google Cloud Storage (with the GCS connector: 
> https://github.com/GoogleCloudPlatform/bigdata-interop/tree/master/gcs) for 
> HCFS compatibility.
> - FileOutputCommitter algorithm v2.
> - Running on Google Compute Engine with Java 8, Debian 8, Hadoop 2.8.2, Spark 
> 2.2.0.
>            Reporter: Karthik Palaniappan
>            Assignee: Karthik Palaniappan
>            Priority: Minor
>             Fix For: 3.1.0, 2.10.0
>
>         Attachments: MAPREDUCE-7029-branch-2.004.patch, 
> MAPREDUCE-7029-branch-2.005.patch, MAPREDUCE-7029-branch-2.005.patch, 
> MAPREDUCE-7029.001.patch, MAPREDUCE-7029.002.patch, MAPREDUCE-7029.003.patch, 
> MAPREDUCE-7029.004.patch, MAPREDUCE-7029.005.patch
>
>
> I ran a Spark job that outputs thousands of parquet files (aka there are 
> thousands of reducers), and it hung for several minutes in the driver after 
> all tasks were complete. Here is a very simple repro of the job (to be run in 
> a spark-shell):
> {code:scala}
> spark.range(1L << 20).repartition(1 << 14).write.save("gs://some/path")
> {code}
> Spark actually calls into Mapreduce's FileOuputCommitter. Job committing 
> (specifically cleanupJob()) recursively deletes the job temporary directory, 
> which is something like "gs://some/path/_temporary". If I understand 
> correctly, on HDFS, this would be O(1), but on GCS (and every HCFS I know), 
> this requires a full file tree walk. Deleting tens of thousands of objects in 
> GCS takes several minutes.
> I propose that commitTask() recursively deletes its the task attempt temp 
> directory (something like "gs://some/path/_temporary/attempt1/task1"). On 
> HDFS, this is O(1) per task, so this is very little overhead per task. On GCS 
> (and other HCFSs), this adds parallelism for deleting the job temp directory.
> With the attached patch, the repro above went from taking ~10 minutes to 
> taking ~5 minutes, and task time did not significantly change.
> Side note: I found this issue with Spark, but I assume it applies to a 
> Mapreduce job with thousands of reducers as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to