[jira] [Commented] (HADOOP-13560) S3ABlockOutputStream to support huge (many GB) file writes

Steve Loughran (JIRA) Wed, 28 Sep 2016 06:27:13 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-13560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15529588#comment-15529588
 ]


Steve Loughran commented on HADOOP-13560:
-----------------------------------------

bq. so we'll eventually need a different patch for trunk.

no problem

bq. All access to S3ABlockOutputStream#closed happens through synchronized 
methods. 

regarding close, i'd actually planned to make close() unsynced; clearly I 
didn't do that final step. By going to unsync we avoid deadlocks if there's >1 
close() call made and the first one is waiting for the upload to complete. On 
that topic, should we add something about close() of fs/stream not being 
blocking to the FS spec?

bq. S3ABlockOutputStream#now returns time in milliseconds, but the JavaDocs 
state nanoseconds.

well spotted.  It's only being used for some metrics about time for blocks to 
get through queue/uploaded. Changed the javadocs

bq. Can ITestS3AHuge* be made to run in parallel instead of sequential? 

the problem here is that the tests saturate the entire network. If you run them 
in parallel with smaller tests, everything slows down. If you run them in 
parallel with each other, things come to a complete halt as the bandwidth is 
split across the tests. It doesn't get any faster, no matter how many cores you 
have. This is network link bound. It's why I added the new {{scale}} profile; 
these really are "set them running and go away" test runs, batch jobs rather 
than things you would do every iteration, at least unless you were actually 
playing with the output streams.

I could see a single huge file test being scheduled while all the small tests 
runs, as long as they don't go near the multipart purge settings. What I can't 
see in failsafe or other junit test runner the way to specify "run the huge 
tests sequentially while running the other tests in parallel". 


regarding the multipart purge, that was causing problems in parallel test runs 
because the purge time was being set such that a large enough upload would 
fail, some of the interim parts would have already been purged by the time the 
commit got through. This isn't a problem with small files, but once you go into 
the many-GB you start to hit various test run scale issues (generally: timeouts 
of purge, failsafe, JUnit test timeout), and encounter transient network 
failures. That's why there's some retry logic on the multipart commit phase: I 
encountered it.

> S3ABlockOutputStream to support huge (many GB) file writes
> ----------------------------------------------------------
>
>                 Key: HADOOP-13560
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13560
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.9.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Minor
>         Attachments: HADOOP-13560-branch-2-001.patch, 
> HADOOP-13560-branch-2-002.patch, HADOOP-13560-branch-2-003.patch, 
> HADOOP-13560-branch-2-004.patch
>
>
> An AWS SDK [issue|https://github.com/aws/aws-sdk-java/issues/367] highlights 
> that metadata isn't copied on large copies.
> 1. Add a test to do that large copy/rname and verify that the copy really 
> works
> 2. Verify that metadata makes it over.
> Verifying large file rename is important on its own, as it is needed for very 
> large commit operations for committers using rename



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-13560) S3ABlockOutputStream to support huge (many GB) file writes

Reply via email to