[ 
https://issues.apache.org/jira/browse/IMPALA-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16846943#comment-16846943
 ] 

ASF subversion and git services commented on IMPALA-8344:
---------------------------------------------------------

Commit 6b09612e763aace6ec3ec22031e4e960b9a41e3d in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=6b09612 ]

IMPALA-8344: Add support for running the minicluster with S3Guard

Some tests can fail on S3 due to some operations that are eventually
consistent. S3Guard stores extra metadata in a DynamoDB to solve
several consistency issues.

This adds support for running the minicluster on S3 with S3Guard.
S3Guard is configured by the following environment variables:
S3GUARD_ENABLED: defaults to false, set to true to enable S3Guard
S3GUARD_DYNAMODB_TABLE: name of the DynamoDB table to use. This must
  be exclusively owned by this minicluster. The dataload scripts
  initialize this table and will purge entries if the table already
  exists. The table should be in the same region as the S3_BUCKET
  for the minicluster.
S3GUARD_DYNAMODB_REGION - AWS region for S3GUARD_DYNAMODB_TABLE
These environment variables only impact S3 configurations.

The support comes from three pieces:
1. Configuration changes in core-site.xml to add the appropriate
   parameters.
2. Updating dataload to initialize/purge the s3guard dynamodb table
   and import data appropriately.
3. Update tests to manipulate files through the HDFS command line
   rather than through s3 utilities. This takes the filesystem
   utility code for ABFS (which actually calls HDFS command line),
   makes it generic, and uses it for S3Guard.

Testing:
 - Ran multiple rounds of s3 tests
 - Aborted tests in the middle and restarted the s3 tests (to test
   the s3guard reinitialization code)

Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Reviewed-on: http://gerrit.cloudera.org:8080/13020
Reviewed-by: Joe McDonnell <joemcdonn...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Reviewed-by: Sahil Takiar <stak...@cloudera.com>


> Add support for running S3 tests with S3Guard
> ---------------------------------------------
>
>                 Key: IMPALA-8344
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8344
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Infrastructure
>    Affects Versions: Impala 3.3.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Critical
>
> Impala s3 tests can encounter failures due to s3's consistency model. S3Guard 
> adds consistency to s3 operations to avoid these types of failures:
> [https://blog.cloudera.com/blog/2017/08/introducing-s3guard-s3-consistency-for-apache-hadoop/]
> Adding support to run tests with S3Guard provides a way to limit flakiness, 
> and it provides coverage for users that would want to use Impala on S3 with 
> S3Guard.
> Support will involve adding the appropriate configuration to core-site.xml. 
> In order to maintain the S3Guard index appropriately, file modifications 
> should go through HDFS commands rather than boto s3 commands. Finally, to 
> reduce costs, Impala may want to have a script to purge S3Guard's dynamodb. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to