On 16 Aug 2017, at 18:39, Andrew Wang <andrew.w...@cloudera.com<mailto:andrew.w...@cloudera.com>> wrote:
Hi Steve, What's the target release vehicle, and the timeline for merging this? The target date for beta1 is mid-September, so any large code movements make me nervous. Code targets trunk, current state is ready to go in. I've also got it building & running against branch-2: all the code is Java-7 and the classpath problems were dealt with by Mingliang. Could you comment on testing and API stability of this branch? I'm trusting the judgement of the contributors involved, since there isn't much time to fix things before beta1. This is all working in the s3 code, and it's something you have to explicitly enable; I'm confident that when disabled it doesn't cause problems There's two modes of use in production (as well as a local dynamodb for testing) * dynamo DB as cache, "non authoritative" * dynamo DB as store of record, "authoritative" I'm fairly happy with non-auth; but as auth assumes that all clients are using s3guard, it's the one with the most risks. That one I'd be cautious over. But it does deliver the best speedup. And it lets you use the v1/v2 algorithms to commit output, as now you get the consistent directory listings you need. There's still the O(data) COPY call, but at least the risk of incomplete listings -> incomplete copy operation is eliminated. We've had a preview version up for a while, running large hive/LLAP tests against it happily in particular, and my spark & cloud testing has shown all is well (indeed, I can show how all isn't well if you enable the inconsistent FS client and *dont* turn s3guard on). After the initial merge, there is more work to do, but mostly around: metrics, diagnostics, and the new committer work which depends on the consistent listings for one of the committers, but doesn't do *any* API calls into s3guard itself. All it needs is a consistent S3 endpoint, be it AWS S3 & S3Guard, or something else like the WDC cloud store. That's not going to be ready for Beta 1. -Steve Best, Andrew On Wed, Aug 16, 2017 at 5:25 AM, Steve Loughran <ste...@hortonworks.com<mailto:ste...@hortonworks.com>> wrote: FYI, We're getting ready for a patch to merge the current S3Guard branch, HADOOP-13345, via a patch https://issues.apache.org/jira/browse/HADOOP-13998 After that's done, we do plan to have a second iteration, work on a 0-rename committer (HADOOP-13786) with all the other tuning and improvements; We'd add a new uber-JIRA & move stuff over, maybe branch, and/or do things patch-by-patch . Anyway, now is a great time for people to download and play https://github.com/apache/hadoop/blob/HADOOP-13345/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md testing this https://github.com/apache/hadoop/blob/HADOOP-13345/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md The Inconsistent AWS Client is also something everyone is free to use for injecting inconsistencies (and soon faults) into their own apps by way of 2-3 config options. Want to know how your code handles S3A being observably inconsistent? We'll let you do that. -Steve