[jira] [Commented] (HADOOP-13560) S3ABlockOutputStream to support huge (many GB) file writes

Chris Nauroth (JIRA) Wed, 28 Sep 2016 11:51:52 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-13560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15530520#comment-15530520
 ]


Chris Nauroth commented on HADOOP-13560:
----------------------------------------

At this point, I've read through all of patch 006 except for the documentation 
updates.  Very nice patch!  Here is more feedback on the code, and I'll start 
reviewing the documentation changes next.

I see the checks in {{S3ADataBlocks#validateWriteArgs}} are taken from 
{{java.io.OutputStream#write}}.  Would you please add a note in the JavaDoc 
that this is implemented specifically to match that contract?

It seems the validation check in {{DataBlock#verifyState}} would only ever 
throw an exception if there was a bug in our S3A code's state transition logic, 
not a recoverable situation for the client.  If so, then is a runtime exception 
more appropriate, such as {{IllegalStateException}}?  If so, then that would 
also avoid the ignored exception in {{DataBlock#enterClosedState}}.

The JavaDoc for {{DataBlock#flush}} states that it is only valid in writing 
state.  Does it make sense to move the validation check for that from 
{{DiskBlock#flush}} up into the base class?

In {{ByteArrayBlock#startUpload}}, can the call to {{buffer.reset()}} be 
removed?  Resetting sets an internal counter without releasing any of the 
underlying heap space, which is what the {{buffer = null}} immediately 
afterwards achieves.

In the single-byte {{ByteBufferInputStream#read}}, I think checking 
{{hasRemaining()}} would be more readable than handling 
{{BufferUnderflowException}}, and it would look consistent with similar logic 
in the multi-byte {{read}}.



Please mark {{final}} on the member fields of {{ByteBufferBlock}} and 
{{DiskBlock}} where appropriate.

{code}
      try {
        out.flush();
        out.close();
      } finally {
        out.close();
        out = null;
      }
{code}

I was a bit confused by the above code in {{DiskBlock#startUpload}}.  I think 
what this is trying to do (and please correct me if I'm wrong) is to make sure 
we propagate an exception thrown from {{flush}}.  Therefore, we can't simplify 
this code to just a single {{close}} call, because even though 
{{BufferedOutputStream#close}} automatically flushes, it ignores exceptions 
from flush, and we'd lose those error details.  If this is the intent, would 
you please add a comment explaining it?

Should {{DiskBlock#startUpload}} wrap the returned stream in a 
{{BufferedInputStream}}?

Is {{ForwardingInputStream}} necessary?  It looks like {{FilterInputStream}}, 
so can {{FileDeletingInputStream}} subclass that instead?

In {{MultiPartUpload#shouldRetry}}, is {{Thread.interrupted()}} supposed to be 
{{Thread.currentThread().interrupt()}}?  If {{InterruptedException}} has been 
thrown, then the interrupted flag has been cleared already, so I think 
{{Thread.interrupted()}} would effectively be a no-op.

Can we avoid creating the {{LocalDirAllocator}} when we are not using 
{{S3AOutputStream}} or {{DiskBlockFactory}}?

I'm unclear why {{MAX_THREADS}} was downtuned to 1 in 
{{ITestS3ABlockingThreadPool}}.  If that's intentional, please also update the 
class-level comment that states 2 threads and describe why the test is still 
sufficient with only 1 thread.


> S3ABlockOutputStream to support huge (many GB) file writes
> ----------------------------------------------------------
>
>                 Key: HADOOP-13560
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13560
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.9.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Minor
>         Attachments: HADOOP-13560-branch-2-001.patch, 
> HADOOP-13560-branch-2-002.patch, HADOOP-13560-branch-2-003.patch, 
> HADOOP-13560-branch-2-004.patch
>
>
> An AWS SDK [issue|https://github.com/aws/aws-sdk-java/issues/367] highlights 
> that metadata isn't copied on large copies.
> 1. Add a test to do that large copy/rname and verify that the copy really 
> works
> 2. Verify that metadata makes it over.
> Verifying large file rename is important on its own, as it is needed for very 
> large commit operations for committers using rename



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-13560) S3ABlockOutputStream to support huge (many GB) file writes

Reply via email to