[jira] [Commented] (HADOOP-11183) Memory-based S3AOutputstream

Steve Loughran (JIRA) Sat, 21 Feb 2015 05:00:27 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14330213#comment-14330213
 ]


Steve Loughran commented on HADOOP-11183:
-----------------------------------------

I have done a code review, but not actually used this yet. I'm worried it's 
being looked at a bit late for the 2.7 branch, but  I can't fault anyone but 
myself and other reviewers for that.

If we get it in, I'd like the markdown docs to highlight "experimental, 
unstable, use at own risk" for this option. Then 2.7 can be viewed as 
stabilisation of it, with goal being trusted by 2.8 & that warning removed. 
Perhaps a special subsection of s3a, "experiment fast output stream" and some 
specifics, rather than just listing a config flag, "fast", which everyone is 
going to turn on based on its name alone.

h3. constructor
# check for and reject negative buffer size. Maybe have a check & Warn/reject 
too small a buffer?
# S3AFastOutputStream constructor should log @debug, not info

h3. {{write()}}
# would benefit from some comments explaining the two branches, e.g. "copy into 
buffer" and "buffer full, write it" + "recursively write remainder"
# If I created a 1-byte buffer, would that tail recursion trigger a stack 
overflow? If so, some checks in ctor may be good, such as just hard coded 
minimum & upping low values to it.

h3. {{close()}}

I'd swap the operations in the finally clause, so no matter what the superclass 
does, this close operation always finishes. Better yet, set close() earlier.

h3. {{uploadBuffer()}}
make private

h3. {{putObject()}}
InterruptedException shouldn't be swallowed here, it hides problems and can 
cause process shutdowns to hang. Either convert to InterruptedIOException, or 
set interrupted flag on the current thread again.

h3. {{waitForAllPartUploads()}}
Need a policy on interrupts here too. Imagine that communications with a remote 
S3 server are blocked, something has just sent a kill -3 to the client app, and 
is now waiting for it to finish cleanly...

h3. {{ProgressableListener}}
can me made private

h3. Other. 

# Can we have {{flush()}} do a partial write? Would that gain anything in 
durability terms? At the very least, it may make timing of writes more 
consistent with other filesystems.
# I can think of some more tests here. If {{S3AFastOutputStream}} added a 
counter of #of uploads, a test could grab that stream and verify that writes 
triggered it, flushes triggered it, 0-byte-writes didn't, close cleaned up, 
etc. Also be interesting to time & compare classic vs fast operations & print 
that in the test results. 

> Memory-based S3AOutputstream
> ----------------------------
>
>                 Key: HADOOP-11183
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11183
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.6.0
>            Reporter: Thomas Demoor
>            Assignee: Thomas Demoor
>         Attachments: HADOOP-11183-004.patch, HADOOP-11183-005.patch, 
> HADOOP-11183.001.patch, HADOOP-11183.002.patch, HADOOP-11183.003.patch, 
> design-comments.pdf
>
>
> Currently s3a buffers files on disk(s) before uploading. This JIRA 
> investigates adding a memory-based upload implementation.
> The motivation is evidently performance: this would be beneficial for users 
> with high network bandwidth to S3 (EC2?) or users that run Hadoop directly on 
> an S3-compatible object store (FYI: my contributions are made in name of 
> Amplidata). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11183) Memory-based S3AOutputstream

Reply via email to