Repository: sqoop
Updated Branches:
  refs/heads/trunk 15097756c -> 9e328a53e


SQOOP-3390: Document S3Guard usage with Sqoop

(Boglarka Egyed via Szabolcs Vasas)


Project: http://git-wip-us.apache.org/repos/asf/sqoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/sqoop/commit/9e328a53
Tree: http://git-wip-us.apache.org/repos/asf/sqoop/tree/9e328a53
Diff: http://git-wip-us.apache.org/repos/asf/sqoop/diff/9e328a53

Branch: refs/heads/trunk
Commit: 9e328a53e1740ca1ed85861311281f8ea5846ecf
Parents: 1509775
Author: Szabolcs Vasas <[email protected]>
Authored: Wed Oct 24 16:44:10 2018 +0200
Committer: Szabolcs Vasas <[email protected]>
Committed: Wed Oct 24 16:44:10 2018 +0200

----------------------------------------------------------------------
 src/docs/user/s3.txt | 32 +++++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/sqoop/blob/9e328a53/src/docs/user/s3.txt
----------------------------------------------------------------------
diff --git a/src/docs/user/s3.txt b/src/docs/user/s3.txt
index c54b26b..52ab6ac 100644
--- a/src/docs/user/s3.txt
+++ b/src/docs/user/s3.txt
@@ -161,4 +161,34 @@ $ sqoop import \
   --external-table-dir s3a://example-bucket/external-directory
 ----
 
-Data from RDBMS can be imported into an external Hive table backed by S3 as 
Parquet file format too.
\ No newline at end of file
+Data from RDBMS can be imported into an external Hive table backed by S3 as 
Parquet file format too.
+
+Hadoop S3Guard usage with Sqoop
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Amazon S3 offers eventual consistency for PUTS and DELETES in all regions 
which means the visibility of the files
+are not guaranteed in a specific time after creation. Due to this behavior it 
can happen that right after a
+sqoop import the data will not be visible immediately. For learning more about 
the core concepts of Amazon S3
+please see the official documentation at 
https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#CoreConcepts.
+
+S3Guard is an experimental feature for the S3A client in Hadoop which can use 
a database as a store of metadata about
+objects in an S3 bucket. For learning more about S3Guard please see the Hadoop 
documentation at
+https://hadoop.apache.org/docs/r3.0.3/hadoop-aws/tools/hadoop-aws/s3guard.html.
+
+S3Guard can be enabled during sqoop imports via setting properties described 
in the linked documentation.
+
+Example usage with setting S3Guard:
+
+----
+$ sqoop import \
+  -Dfs.s3a.access.key=$AWS_ACCESS_KEY \
+  -Dfs.s3a.secret.key=$AWS_SECRET_KEY \
+  
-Dfs.s3a.metadatastore.impl=org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore
 \
+  -Dfs.s3a.s3guard.ddb.region=$BUCKET_REGION \
+  -Dfs.s3a.s3guard.ddb.table.create=true \
+  --connect $CONN \
+  --username $USER \
+  --password $PWD \
+  --table $TABLENAME \
+  --target-dir s3a://example-bucket/target-directory
+----

Reply via email to