[ 
https://issues.apache.org/jira/browse/HADOOP-15087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276837#comment-16276837
 ] 

Steve Loughran commented on HADOOP-15087:
-----------------------------------------

The key flaw with the existing committers is not that they are are slow, it is 
in the absence of consistent file listings rename() can miss things to copy, so 
can deliver bad answers. You can't safely use the normal output committers 
against AWS S3 without S3Guard, though other implementations (yours too?) can 
be have differently.

Have you played with the S3A committers? I think we can outdo stocator with 
better failure semantics, though I've got to benchmark it properly. Why don't 
you have a go there?


Be good to see your patch though, as it'd be something to line up all  commit 
strategies, "class/broken", "s3a staging", "s3a magic", "stocator". 


FWIW, Teragen is a meaningless benchmark except as a stress test of a cluster 
and bootstrap to terasort tests; it doesn't resemble any real workloads. TCP-DS 
is the one to play with.

See also: 
http://steveloughran.blogspot.co.uk/2017/09/stocator-high-performance-object-store.html

> Write directly without creating temp directory to avoid rename 
> ---------------------------------------------------------------
>
>                 Key: HADOOP-15087
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15087
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>            Reporter: Yonger
>
> Rename in workloads like Teragen/Terasort who use Hadoop default 
> outputcommitters really hurt performance a lot. 
> Stocator announce it doesn't create the temporary directories any all, and 
> still preserves Hadoop's fault tolerance. I add a switch when creating file 
> via integrating it's code into s3a, I got 5x performance gain in Teragen and 
> 15% performance improvement in Terasort.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to