[
https://issues.apache.org/jira/browse/HADOOP-13560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15529588#comment-15529588
]
Steve Loughran commented on HADOOP-13560:
-----------------------------------------
bq. so we'll eventually need a different patch for trunk.
no problem
bq. All access to S3ABlockOutputStream#closed happens through synchronized
methods.
regarding close, i'd actually planned to make close() unsynced; clearly I
didn't do that final step. By going to unsync we avoid deadlocks if there's >1
close() call made and the first one is waiting for the upload to complete. On
that topic, should we add something about close() of fs/stream not being
blocking to the FS spec?
bq. S3ABlockOutputStream#now returns time in milliseconds, but the JavaDocs
state nanoseconds.
well spotted. It's only being used for some metrics about time for blocks to
get through queue/uploaded. Changed the javadocs
bq. Can ITestS3AHuge* be made to run in parallel instead of sequential?
the problem here is that the tests saturate the entire network. If you run them
in parallel with smaller tests, everything slows down. If you run them in
parallel with each other, things come to a complete halt as the bandwidth is
split across the tests. It doesn't get any faster, no matter how many cores you
have. This is network link bound. It's why I added the new {{scale}} profile;
these really are "set them running and go away" test runs, batch jobs rather
than things you would do every iteration, at least unless you were actually
playing with the output streams.
I could see a single huge file test being scheduled while all the small tests
runs, as long as they don't go near the multipart purge settings. What I can't
see in failsafe or other junit test runner the way to specify "run the huge
tests sequentially while running the other tests in parallel".
regarding the multipart purge, that was causing problems in parallel test runs
because the purge time was being set such that a large enough upload would
fail, some of the interim parts would have already been purged by the time the
commit got through. This isn't a problem with small files, but once you go into
the many-GB you start to hit various test run scale issues (generally: timeouts
of purge, failsafe, JUnit test timeout), and encounter transient network
failures. That's why there's some retry logic on the multipart commit phase: I
encountered it.
> S3ABlockOutputStream to support huge (many GB) file writes
> ----------------------------------------------------------
>
> Key: HADOOP-13560
> URL: https://issues.apache.org/jira/browse/HADOOP-13560
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.9.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Minor
> Attachments: HADOOP-13560-branch-2-001.patch,
> HADOOP-13560-branch-2-002.patch, HADOOP-13560-branch-2-003.patch,
> HADOOP-13560-branch-2-004.patch
>
>
> An AWS SDK [issue|https://github.com/aws/aws-sdk-java/issues/367] highlights
> that metadata isn't copied on large copies.
> 1. Add a test to do that large copy/rname and verify that the copy really
> works
> 2. Verify that metadata makes it over.
> Verifying large file rename is important on its own, as it is needed for very
> large commit operations for committers using rename
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]