[
https://issues.apache.org/jira/browse/HADOOP-16058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16746724#comment-16746724
]
Steve Loughran commented on HADOOP-16058:
-----------------------------------------
{code}
bin/hadoop fs -cat
s3a://hwdev-steve-ireland-new/terasort-ITestMagicTerasort/results.csv
"Operation" "Duration"
"Generate" "0:22.854s"
"Terasort" "0:30.228s"
"Validate" "0:27.682s"
"Completed" "1:27.840s"
{code}
Directory staging committer:
{code}
bin/hadoop fs -cat
s3a://hwdev-steve-ireland-new/terasort-ITestDirectoryTerasort/results.csv
"Operation" "Duration"
"Generate" "0:22.111s"
"Terasort" "0:24.613s"
"Validate" "0:24.504s"
"Completed" "1:19.135s"
{code}
The client is a laptop, store S3 ireland a few hundred millis away; the latency
of S3 calls means that the magic committer, which uses S3 over a miniHDFS
cluster, suffers a lot from the latency. You'd need to be creating larger files
for the incremental write to become relevant
> S3A tests to include Terasort
> -----------------------------
>
> Key: HADOOP-16058
> URL: https://issues.apache.org/jira/browse/HADOOP-16058
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3, test
> Affects Versions: 3.3.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Major
>
> Add S3A tests to run terasort for the magic and directory committers.
> MAPREDUCE-7091 is a requirement for this
> Bonus feature: print the results to see which committers are faster in the
> specific test setup. As that's a function of latency to the store, bandwidth
> and size of jobs, it's not at all meaningful, just interesting.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]