[
https://issues.apache.org/jira/browse/HADOOP-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14330213#comment-14330213
]
Steve Loughran commented on HADOOP-11183:
-----------------------------------------
I have done a code review, but not actually used this yet. I'm worried it's
being looked at a bit late for the 2.7 branch, but I can't fault anyone but
myself and other reviewers for that.
If we get it in, I'd like the markdown docs to highlight "experimental,
unstable, use at own risk" for this option. Then 2.7 can be viewed as
stabilisation of it, with goal being trusted by 2.8 & that warning removed.
Perhaps a special subsection of s3a, "experiment fast output stream" and some
specifics, rather than just listing a config flag, "fast", which everyone is
going to turn on based on its name alone.
h3. constructor
# check for and reject negative buffer size. Maybe have a check & Warn/reject
too small a buffer?
# S3AFastOutputStream constructor should log @debug, not info
h3. {{write()}}
# would benefit from some comments explaining the two branches, e.g. "copy into
buffer" and "buffer full, write it" + "recursively write remainder"
# If I created a 1-byte buffer, would that tail recursion trigger a stack
overflow? If so, some checks in ctor may be good, such as just hard coded
minimum & upping low values to it.
h3. {{close()}}
I'd swap the operations in the finally clause, so no matter what the superclass
does, this close operation always finishes. Better yet, set close() earlier.
h3. {{uploadBuffer()}}
make private
h3. {{putObject()}}
InterruptedException shouldn't be swallowed here, it hides problems and can
cause process shutdowns to hang. Either convert to InterruptedIOException, or
set interrupted flag on the current thread again.
h3. {{waitForAllPartUploads()}}
Need a policy on interrupts here too. Imagine that communications with a remote
S3 server are blocked, something has just sent a kill -3 to the client app, and
is now waiting for it to finish cleanly...
h3. {{ProgressableListener}}
can me made private
h3. Other.
# Can we have {{flush()}} do a partial write? Would that gain anything in
durability terms? At the very least, it may make timing of writes more
consistent with other filesystems.
# I can think of some more tests here. If {{S3AFastOutputStream}} added a
counter of #of uploads, a test could grab that stream and verify that writes
triggered it, flushes triggered it, 0-byte-writes didn't, close cleaned up,
etc. Also be interesting to time & compare classic vs fast operations & print
that in the test results.
> Memory-based S3AOutputstream
> ----------------------------
>
> Key: HADOOP-11183
> URL: https://issues.apache.org/jira/browse/HADOOP-11183
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.6.0
> Reporter: Thomas Demoor
> Assignee: Thomas Demoor
> Attachments: HADOOP-11183-004.patch, HADOOP-11183-005.patch,
> HADOOP-11183.001.patch, HADOOP-11183.002.patch, HADOOP-11183.003.patch,
> design-comments.pdf
>
>
> Currently s3a buffers files on disk(s) before uploading. This JIRA
> investigates adding a memory-based upload implementation.
> The motivation is evidently performance: this would be beneficial for users
> with high network bandwidth to S3 (EC2?) or users that run Hadoop directly on
> an S3-compatible object store (FYI: my contributions are made in name of
> Amplidata).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)