On 16 Aug 2017, at 18:39, Andrew Wang 
<andrew.w...@cloudera.com<mailto:andrew.w...@cloudera.com>> wrote:

Hi Steve,

What's the target release vehicle, and the timeline for merging this? The 
target date for beta1 is mid-September, so any large code movements make me 
nervous.

Code targets trunk, current state is ready to go in.

I've also got it building & running against branch-2: all the code is Java-7 
and the classpath problems were dealt with by Mingliang.


Could you comment on testing and API stability of this branch? I'm trusting the 
judgement of the contributors involved, since there isn't much time to fix 
things before beta1.


This is all working in the s3 code, and it's something you have to explicitly 
enable; I'm confident that when disabled it doesn't cause problems

There's two modes of use in production (as well as a local dynamodb for testing)

* dynamo DB as cache, "non authoritative"
* dynamo DB as store of record, "authoritative"

I'm fairly happy with non-auth; but as auth assumes that all clients are using 
s3guard, it's the one with the most risks. That one I'd be cautious over. But 
it does deliver the best speedup. And it lets you use the v1/v2 algorithms to 
commit output, as now you get the consistent directory listings you need. 
There's still the O(data) COPY call, but at least the risk of incomplete 
listings -> incomplete copy operation is eliminated.

We've had a preview version up for a while, running large hive/LLAP tests 
against it happily in particular, and my spark & cloud testing has shown all is 
well (indeed, I can show how all isn't well if you enable the inconsistent FS 
client and *dont* turn s3guard on).

After the initial merge, there is more work to do, but mostly around: metrics, 
diagnostics, and the new committer work which depends on the consistent 
listings for one of the committers, but doesn't do *any* API calls into s3guard 
itself. All it needs is a consistent S3 endpoint, be it AWS S3 & S3Guard, or 
something else like the WDC cloud store. That's not going to be ready for Beta 
1.

-Steve




Best,
Andrew

On Wed, Aug 16, 2017 at 5:25 AM, Steve Loughran 
<ste...@hortonworks.com<mailto:ste...@hortonworks.com>> wrote:

FYI, We're getting ready for a patch to merge the current S3Guard branch, 
HADOOP-13345, via a patch https://issues.apache.org/jira/browse/HADOOP-13998

After that's done, we do plan to have a second iteration, work on a 0-rename 
committer (HADOOP-13786) with all the other tuning and improvements; We'd add a 
new uber-JIRA & move stuff over, maybe branch, and/or do things patch-by-patch .

Anyway, now is a great time for people to download and play

https://github.com/apache/hadoop/blob/HADOOP-13345/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md

testing this

https://github.com/apache/hadoop/blob/HADOOP-13345/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md

The Inconsistent AWS Client is also something everyone is free to use for 
injecting inconsistencies (and soon faults) into their own apps by way of 2-3 
config options. Want to know how your code handles S3A being observably 
inconsistent? We'll let you do that.

-Steve




Reply via email to