[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801584#comment-17801584
 ] 

Steve Loughran commented on MAPREDUCE-7465:
-------------------------------------------

I understand your pain here, but am reluctant to do this for a few reasons

* fear of going near a complicated and critical piece of code. I am scared of 
it. It's got two co-recursive algorithms and is a critical workflow.
* experience of an internal release with this against abfs. it's not enough: 
hits throttling and not resilient to failures.
* non-atomic task commit with google gcs
* dir tree cleanup scale issues with abfs and gcs

The good news: the MAPREDUCE-7341 manifest committer addresses all of this
* correct and performant with hdfs, gcs and abfs
* it rate limits (and resiliently) renames against abfs
* because it does the task attempt dir scan on task commit, there's no need for 
parallel treewalks in job commit. 
* parallel task attempt cleanup for filesystems with O(files) dir delete
* the _SUCCESS file is json stats file with timings; can optionally be saved 
elsewhere too.

Your PR isn't going to be get to any release which doesn't have the 
intermediate manifest committer in it.

Now, I can see you are making changes to parquet to work better here; this is 
lovely. I'll review/comment those. I think a key one is for parquet and spark 
to both lower their expectations of committer types.

Spark's hadoop-cloud module does have the delegation class you already need, 
it's documented in https://spark.apache.org/docs/latest/cloud-integration.html

this means you should be able to switch to the manifest committer *today* if 
your hadoop mapreduce library supports it. Yes, it does work with HDFS too, 
it's just not tested as rigorously...we have gcs and abfs using this as the 
default.

What I will do is do a pr up with our own internal change...this isn't 
something we enable as the manifest committer is so much better. but you can 
compare it with yours, including the tests, and know that we have at least done 
some NFQE testing with it.


> performance problem in FileOutputCommiter for big list processed  by single 
> thread
> ----------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-7465
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7465
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: performance
>    Affects Versions: 3.2.3, 3.3.2, 3.2.4, 3.3.5, 3.3.3, 3.3.4, 3.3.6
>            Reporter: Arnaud Nauwynck
>            Priority: Minor
>              Labels: pull-request-available
>
> when commiting a big hadoop job (for example via Spark) having many 
> partitions,
> the class FileOutputCommiter process thousands of dirs/files to rename with a 
> single Thread. This is performance issue, caused by lot of waits on 
> FileStystem storage operations.
> I propose that above a configurable threshold (default=3, configurable via 
> property 'mapreduce.fileoutputcommitter.parallel.threshold'), the class 
> FileOutputCommiter process the list of files to rename using parallel 
> threads, using the default jvm ExecutorService (ForkJoinPool.commonPool())
> See Pull-Request: 
> [https://github.com/apache/hadoop/pull/6378|https://github.com/apache/hadoop/pull/6378]
> Notice that sub-class instances of FileOutputCommiter are supposed to be 
> created at runtime dependending of a configurable property 
> ([https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/PathOutputCommitterFactory.java|PathOutputCommitterFactory.java]).
> But for example in Parquet + Spark, this is buggy and can not be changed at 
> runtime. 
> There is an ongoing Jira and PR to fix it in Parquet + Spark: 
> [https://issues.apache.org/jira/browse/PARQUET-2416|https://issues.apache.org/jira/browse/PARQUET-2416]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to