[
https://issues.apache.org/jira/browse/HADOOP-14971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16213302#comment-16213302
]
ASF GitHub Bot commented on HADOOP-14971:
-----------------------------------------
GitHub user steveloughran opened a pull request:
https://github.com/apache/hadoop/pull/282
HADOOP-14971 Merge S3A committers into trunk
HADOOP 13786 & MAPREDUCE-6823 code as a PR for better review
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/steveloughran/hadoop
s3guard/HADOOP-13786-committer
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/hadoop/pull/282.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #282
----
commit 70e2a84547936cdfa65c58a2482c498eabbce889
Author: Steve Loughran <[email protected]>
Date: 2017-09-06T17:18:15Z
HADOOP-13786: apply the HADOOP-13796-on-branch-2 patch to trunk, whitspace
fix
commit 738b0c045603182b38d1ce08d97f60393043f565
Author: Steve Loughran <[email protected]>
Date: 2017-09-06T18:28:37Z
HADOOP-13786 fixing docs to avoid doxia bug on level 4 entries
commit 6d99b815eb33ccc0d0514e6330eb255f77d29372
Author: Steve Loughran <[email protected]>
Date: 2017-09-12T19:57:02Z
HADOOP-13786 HADOOP-14303 error handling: l-exp wrappers around core
metadata ops
commit d9f72547212f7bc5c47b4aab949581e5e0d448ee
Author: Steve Loughran <[email protected]>
Date: 2017-09-12T19:57:31Z
HADOOP-13786 TestStagingCommitter -> java 8 closures
commit 09249b354c9c0043d2690cd4d3fbb124e09eb2d8
Author: Steve Loughran <[email protected]>
Date: 2017-09-13T15:57:33Z
HADOOP-13786 HADOOP-14531 lambda wrapper around all production s3 calls
* all invocations of s3 calls are wrapped where appropriate, either with
once() (which does the translation), retry() or retryUntranslated
* javadocs state retry policy; this is propagated to give callers an idea
of what retries already
* commit tests -> java 8 lambdas too
* test json serdeser in hadoop common
* checkstyle
commit 5814f22aeab22d5a3bacb27bb456d665530c3d94
Author: Steve Loughran <[email protected]>
Date: 2017-09-15T10:30:57Z
HADOOP-13786 HADOOP-14531
* new @Retries annotation for the s3a classes to use to make their retry
policy more visible in the source. This is a source-only annotation unused
anywhere, but does make visible policy. You can't call a non-retrying method
and be retrying yourself unless you add your own retry logic
* fault injecting AWSS3 client better about knowing when is good to fail
(i.e not so aggressively on listing operations)
* callback interface for before/during retries unified
* and logging cut back so only first failure gets logged on a retry loop.
Maybe that could be tuned to remember the previous failure & log if its
different class
* all integration tests excluding rename() ones are now working when tested
with a high (25-50% throttle rate).
* DDB logs of capacity limit failures
commit 2bd385361bda4ddd1590dfed7c3377bee1ffa739
Author: Steve Loughran <[email protected]>
Date: 2017-09-15T10:31:19Z
HADOOP-13786 turn off false alarm in findbugs
commit e8039d3d7734b607c0d0e093ea6d573672490753
Author: Steve Loughran <[email protected]>
Date: 2017-09-19T10:38:32Z
HADOOP-13786 MAPREDUCE-6823 FileOutputFormat uses the committer factory,
with tests
commit 1e61b94490fbf3f75330ceea3b5d3b863f5efbe6
Author: Steve Loughran <[email protected]>
Date: 2017-09-19T13:47:44Z
HADOOP-13786
* s/DefaultPutTracker/r/PutTracker. Yes, it is the default one, but its
misleading as a type.
* move to l-expressions in block output stream callables & the committers.
Exception: Tasks.runParallel() whose closure is complex enough that the IDE was
warning about its size. Maybe best to refactor as a method invoked as this::exec
* Adding new statistic, {{committer_bytes_uploaded}}, set when a stream is
closed to #of bytes PUT.
* S3A FS implements {{StreamCapabilities}}, dynamically declares if it is
magic by returning true on {{hasCapability("fs.s3a.magic.enabled")}} when it is.
* S3ABlockOutputStream implements {{StreamCapabilities}}; dynamically
declares if its output has delayed visibility. Also: that it doesn't do
hsync/hflush, obviously.
* {{CommitOperations}}: Experimented with replacing {{MaybeIOE}} with Java
8 Optional<> type. Doesn't work as {{maybeThrow}} can't be implemented as
{{Optional<IOException>.map((e) -> {throw e;})}}; java's checked exceptions
makes maps fairly useless for the Hadoo IOE-throwing APIs. OUutcome:
{{MaybeIOE}} unchanged.
* Minor cleanup of production & test code
* starting to write end user documentation. Needs more clarity on directory
vs partitioned output on staging committer, including examples
commit 798e0a3e2ed9ad0185ca003151489ff18acdacfb
Author: Steve Loughran <[email protected]>
Date: 2017-09-21T10:30:38Z
HADOOP-13786 MAPREDUCE-6823 adding public getOutputPath to
PathOutputCommitter API, as some callers currenly scan the JobConf settings to
find this value
commit 91611c32e19ab3fb59ebc1c99b8d3855c50de56b
Author: Steve Loughran <[email protected]>
Date: 2017-09-21T10:38:18Z
HADOOP-13786 altering s3a committer code to track MAPREDUCE-6823,
commit 91bc628638f65dab3b5f8bdad3e89bcc0c874af0
Author: Steve Loughran <[email protected]>
Date: 2017-09-22T10:46:42Z
HADOOP-13786 HADOOP-14531: 443 response goes to NoResponseException, treat
as retryable for non-idempotent calls only
commit d0d36abc95b4108f1c2e7fb3825a4353b47351ec
Author: Steve Loughran <[email protected]>
Date: 2017-09-22T18:49:00Z
HADOOP-13786 downgrade startup log about magic from info to debug. s3guard
bucket-info should show its status though. Also, move another anon class to a
l-exp
commit 77f9fb212d1d83868b85d5689f3cd7ecd7165eec
Author: Steve Loughran <[email protected]>
Date: 2017-09-26T14:55:37Z
HADOOP-13786 HADOOP-14531 DDB throttling events are logged as a
quantile/rate metric (Hz) rather than just total count.
commit c98b1421ca131406a2059f2a6659d86377eaf971
Author: Steve Loughran <[email protected]>
Date: 2017-09-26T19:10:00Z
HADOOP-13786 javadocs of Retries
commit 78f85138a521a800d99ec2a257ad5cf1c8e6e445
Author: Steve Loughran <[email protected]>
Date: 2017-09-27T16:08:25Z
HADOOP-13786 HADOOP-14531 rework retry logic, including Ewan's feedback.
New names, les logging. Also, exceptions are translated before the event
handler is called, even if the operation is untranslated. This means the event
handler doesn't need to worry about whether the incoming event is raw/vs
translated
commit 51d4d519efde7412ea12df930c69e54c3a5432e0
Author: Steve Loughran <[email protected]>
Date: 2017-09-27T18:27:55Z
HADOOP-13786 checkstyle and bucket-info gains a "-magic" command to verify
that magic support is turned on
commit 48566c512b6faa7cadc4b7f5b8709ca01a9a9c03
Author: Steve Loughran <[email protected]>
Date: 2017-09-28T18:25:14Z
HADOOP-13786 MAPREDUCE-6823 more test on the commit factories
commit 272e32a0e42d3c798e1def10197997f8ebb5b342
Author: Steve Loughran <[email protected]>
Date: 2017-09-28T18:26:23Z
HADOOP-13786 more on commit algorithms themselves, turning docs and
commit/abort code to match
commit 34aee058cdd8d691fd61592e353e3b717b145a94
Author: Steve Loughran <[email protected]>
Date: 2017-10-03T15:00:54Z
HADOOP-13786 paste in code from how the MR AM creates a committer, to
verify that it works without spinning up the whole cluster
Change-Id: I6841877fde593d6dffa1ba6065a2dc7564ab3329
(cherry picked from commit 3634f5a20c76c0b28c4f9b4f7e39af4db5fc8c68)
commit d1b072c4faea798106378416479f5916b7d3d325
Author: Steve Loughran <[email protected]>
Date: 2017-10-03T17:03:59Z
HADOOP-13786 MAPREDUCE-6823 improving commentary on committer factory;
clean up tests
Change-Id: Ie468a243b23e389122b1e1c7281f76671d567167
commit d775a149b45377b31e612212df8485f9aa564f2a
Author: Steve Loughran <[email protected]>
Date: 2017-10-03T18:25:35Z
HADOO-13786 setting up for testing of partitioning merge strategies. I
understand what it is trying to do now
Change-Id: Ia1e4834e5793a9a768e4f373b7dafb39e195af4e
commit f2e0701b81c180e93464d5734e20a3e65509aedb
Author: Steve Loughran <[email protected]>
Date: 2017-10-04T19:38:45Z
HADOOP-13786 partitioned committer work (+some java 8 bits)
* move lambda map/flatmap/apply ops on located file status iterator into
S3AUtils from TestUtils, use in staging committer & commit operations;
* document what partitioned committer does, with notes (needs verification)
* testing of Paths.addUUID() and fix failures
Change-Id: I7329a45668f272162d836a2bbbf2cf3e71c56e56
commit 40204f1169a515f10f9a0d0c9283b27efb8c2653
Author: Steve Loughran <[email protected]>
Date: 2017-10-04T19:40:28Z
HADOOP-13786 revert back to java-7 logic in CommitOperations: cute but
overcomplex here.
Change-Id: I6f5a176e360cc6071a0f35cbb324f50fb335b233
commit fa2860c7505d6ae1ac8360b3998aa0034ecce448
Author: Steve Loughran <[email protected]>
Date: 2017-10-09T17:14:57Z
HADOOP-13786 MAPREDUCE-6823 remove createCommitter(JobContext) as the only
place a FileOutputCommitter is created off a job context is in the code
bridging from the v1 to v2 APIs of FileOutputFormat. The new factory model
doesn't support v1 MR, so it's not needed. This simplifies testing and allows
for code cutbacks in the s3a implementations & downstream.
Change-Id: Ifb51c1465a359f7f2cdafb16fe6e21dd143cadbf
commit 8f696e74d0d1d4eb0c3737c21705bc61f06087e8
Author: Steve Loughran <[email protected]>
Date: 2017-10-09T17:19:13Z
HADOOP-13786 S3A committers don't need to support a JobContext in the
constructors or factories: remove, clean up tests. Where tests do need to
create a Committer with nothing but a JobConf, use the same code which MR
itself does for this, now statically exported from AbstractCommitITest
Change-Id: I79ab5acd9e4c15f4c1b9b520cf18258a97b7dbdc
commit 4cfa70bb1479fb7e938597b5ff0f278ee22fd9f3
Author: Steve Loughran <[email protected]>
Date: 2017-10-09T18:46:05Z
HADOOP-13786: Success marker: Should we delete this when a job starts?
Yes: its presence marks the completion of a job
No: if it contains metadata, that data may be valid until the new data is
present
Change-Id: I359cb943745f6b7b58667f7462bfcb7c0b0313e7
commit ac091be5eb2e975fd89b250f6900fadf4e84351e
Author: Steve Loughran <[email protected]>
Date: 2017-10-10T18:23:59Z
HADOOP-13786 MAPREDUCE-6823 There's now a "BindingPathOutputCommitter"
which can be instantiated and which relays its invocations to the factory.
This is useful to work with code which takes a committer classname to know what
to instantiated -it allows you to delegate to the factory for dynamic binding
on a per-destination basis.
Change-Id: I0472c60df98a54e5272b221c650c2a09e3d46fa1
commit c74a599bc89db43b9df7b76478e54d6d5666cb11
Author: Steve Loughran <[email protected]>
Date: 2017-10-10T18:25:32Z
HADOOP-13786 MAPREDUCE-6823 static method to combine createing factory &
committer in one go; turns out to be a useful operation downstream, so merits
simplification. Tests too.
Change-Id: Ie5173141132ba41bad5af9f97fd67056428e7f2b
commit 84fab155a549c53ee46760ec390c87b8a54b13f4
Author: Steve Loughran <[email protected]>
Date: 2017-10-12T20:28:37Z
HADOOP-13786
* WriteOperationHelper no longer takes a key in its constructor, caller
must supply on the relevant ops
* _SUCCESS file includes a name field which is validated on load; goal is
to identify other formats/versions and reject.
* big code review of tests, including renaming, cleanup, IDE-suggested
cleanup
* tests also verify that the hasCapabilities() field returns true for the
magic option on a magic write, false for a non-magic one, even on a magic FS.
Change-Id: Ia2de777e98c73819d44c2b755fb57be4be5e4a34
----
> Merge S3A committers into trunk
> -------------------------------
>
> Key: HADOOP-14971
> URL: https://issues.apache.org/jira/browse/HADOOP-14971
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.0.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
>
> Merge the HADOOP-13786 committer into trunk. This branch is being set up as a
> github PR for review there & to keep it out the mailboxes of the watchers on
> the main JIRA
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]