[jira] [Work logged] (HADOOP-17409) Remove S3Guard - no longer needed

ASF GitHub Bot (Jira) Fri, 05 Nov 2021 06:31:58 -0700


     [ 
https://issues.apache.org/jira/browse/HADOOP-17409?focusedWorklogId=677039&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-677039
 ]


ASF GitHub Bot logged work on HADOOP-17409:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 05/Nov/21 13:30
            Start Date: 05/Nov/21 13:30
    Worklog Time Spent: 10m 
      Work Description: steveloughran commented on a change in pull request 
#3534:
URL: https://github.com/apache/hadoop/pull/3534#discussion_r743662102



##########
File path: hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
##########
@@ -84,705 +89,31 @@ Once you are confident that all applications have been 
restarted, _Delete the Dy
 This is to avoid paying for a database you no longer need.
 This is best done from the AWS GUI.
 
-## Setting up S3Guard
-
-### S3A to warn or fail if S3Guard is disabled
-A seemingly recurrent problem with S3Guard is that people think S3Guard is
-turned on but it isn't.
-You can set `org.apache.hadoop.fs.s3a.s3guard.disabled.warn.level`
-to avoid this. The property sets what to do when an S3A FS is instantiated
-without S3Guard. The following values are available:
-
-* `SILENT`: Do nothing.
-* `INFORM`: Log at info level that FS is instantiated without S3Guard.
-* `WARN`: Warn that data may be at risk in workflows.
-* `FAIL`: S3AFileSystem instantiation will fail.
-
-The default setting is `SILENT`. The setting is case insensitive.
-The required level can be set in the `core-site.xml`.
-
----
-The latest configuration parameters are defined in `core-default.xml`.  You
-should consult that file for full information, but a summary is provided here.
-
+## Removing S3Guard Configurations
 
-### 1. Choose the Database
+The `fs.s3a.metadatastore.impl` option must be deleted, set to the empty 
string "",
+or to the "Null" Metadata store 
`org.apache.hadoop.fs.s3a.s3guard.NullMetadataStore`.
 
-A core concept of S3Guard is that the directory listing data of the object
-store, *the metadata* is replicated in a higher-performance, consistent,
-database. In S3Guard, this database is called *The Metadata Store*
-
-By default, S3Guard is not enabled.
-
-The Metadata Store to use in production is bonded to Amazon's DynamoDB
-database service.  The following setting will enable this Metadata Store:
 
 ```xml
 <property>
     <name>fs.s3a.metadatastore.impl</name>
-    <value>org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore</value>
+    <value></value>
 </property>
 ```
 
-Note that the `NullMetadataStore` store can be explicitly requested if desired.
-This offers no metadata storage, and effectively disables S3Guard.
-
 ```xml
 <property>
     <name>fs.s3a.metadatastore.impl</name>
     <value>org.apache.hadoop.fs.s3a.s3guard.NullMetadataStore</value>
 </property>
 ```
 
-### 2. Configure S3Guard Settings
-
-More settings will may be added in the future.
-Currently the only Metadata Store-independent setting, besides the
-implementation class above, are the *allow authoritative* and *fail-on-error*
-flags.
-
-#### <a name="authoritative"></a>  Authoritative S3Guard
-
-Authoritative S3Guard is a complicated configuration which delivers performance
-at the expense of being unsafe for other applications to use the same directory
-tree/bucket unless configured consistently.
-
-It can also be used to support [directory marker 
retention](directory_markers.html)
-in higher-performance but non-backwards-compatible modes.
-
-Most deployments do not use this setting -it is ony used in deployments where
-specific parts of a bucket (e.g. Apache Hive managed tables) are known to
-have exclusive access by a single application (Hive) and other 
tools/applications
-from exactly the same Hadoop release.
-
-The _authoritative_ expression in S3Guard is present in two different layers, 
for
-two different reasons:
-
-* Authoritative S3Guard
-    * S3Guard can be set as authoritative, which means that an S3A client will
-    avoid round-trips to S3 when **getting file metadata**, and **getting
-    directory listings** if there is a fully cached version of the directory
-    stored in metadata store.
-    * This mode can be set as a configuration property
-    `fs.s3a.metadatastore.authoritative`
-    * It can also be set only on specific directories by setting
-    `fs.s3a.authoritative.path` to one or more prefixes, for example
-    `s3a://bucket/path` or "/auth1,/auth2".
-    * All interactions with the S3 bucket(s) must be through S3A clients 
sharing
-    the same metadata store.
-    * This is independent from which metadata store implementation is used.
-    * In authoritative mode the metadata TTL metadata expiry is not effective.
-    This means that the metadata entries won't expire on authoritative paths.
-
-* Authoritative directory listings (isAuthoritative bit)
-    * Tells if the stored directory listing metadata is complete.
-    * This is set by the FileSystem client (e.g. s3a) via the 
`DirListingMetadata`
-    class (`org.apache.hadoop.fs.s3a.s3guard.DirListingMetadata`).
-    (The MetadataStore only knows what the FS client tells it.)
-    * If set to `TRUE`, we know that the directory listing
-    (`DirListingMetadata`) is full, and complete.
-    * If set to `FALSE` the listing may not be complete.
-    * Metadata store may persist the isAuthoritative bit on the metadata store.
-    * Currently `org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore` and
-    `org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore` implementation
-    supports authoritative bit.
-
-More on Authoritative S3Guard:
-
-* This setting is about treating the MetadataStore (e.g. dynamodb) as the 
source
- of truth in general, and also to short-circuit S3 list objects and serve
- listings from the MetadataStore in some circumstances.
-* For S3A to skip S3's get object metadata, and serve it directly from the
-MetadataStore, the following things must all be true:
-    1. The S3A client is configured to allow MetadataStore to be authoritative
-    source of a file metadata (`fs.s3a.metadatastore.authoritative=true`).
-    1. The MetadataStore has the file metadata for the path stored in it.
-* For S3A to skip S3's list objects on some path, and serve it directly from
-the MetadataStore, the following things must all be true:
-    1. The MetadataStore implementation persists the bit
-    `DirListingMetadata.isAuthorititative` set when calling
-    `MetadataStore#put` (`DirListingMetadata`)
-    1. The S3A client is configured to allow MetadataStore to be authoritative
-    source of a directory listing (`fs.s3a.metadatastore.authoritative=true`).
-    1. The MetadataStore has a **full listing for path** stored in it.  This 
only
-    happens if the FS client (s3a) explicitly has stored a full directory
-    listing with `DirListingMetadata.isAuthorititative=true` before the said
-    listing request happens.
-
-This configuration only enables authoritative mode in the client layer. It is
-recommended that you leave the default setting here:
-
-```xml
-<property>
-    <name>fs.s3a.metadatastore.authoritative</name>
-    <value>false</value>
-</property>
-```
-
-Note that a MetadataStore MAY persist this bit in the directory listings. (Not
-MUST).
-
-Note that if this is set to true, it may exacerbate or persist existing race
-conditions around multiple concurrent modifications and listings of a given
-directory tree.
-
-In particular: **If the Metadata Store is declared as authoritative,
-all interactions with the S3 bucket(s) must be through S3A clients sharing
-the same Metadata Store**
-
-#### TTL metadata expiry
-
-It can be configured how long an entry is valid in the MetadataStore
-**if the authoritative mode is turned off**, or the path is not
-configured to be authoritative.
-If `((lastUpdated + ttl) <= now)` is false for an entry, the entry will
-be expired, so the S3 bucket will be queried for fresh metadata.
-The time for expiry of metadata can be set as the following:
-
-```xml
-<property>
-    <name>fs.s3a.metadatastore.metadata.ttl</name>
-    <value>15m</value>
-</property>
-```
-
-#### Fail on Error
-
-By default, S3AFileSystem write operations will fail when updates to
-S3Guard metadata fail. S3AFileSystem first writes the file to S3 and then
-updates the metadata in S3Guard. If the metadata write fails,
-`MetadataPersistenceException` is thrown.  The file in S3 **is not** rolled
-back.
-
-If the write operation cannot be programmatically retried, the S3Guard metadata
-for the given file can be corrected with a command like the following:
-
-```bash
-hadoop s3guard import [-meta URI] s3a://my-bucket/file-with-bad-metadata
-```
-
-Programmatic retries of the original operation would require overwrite=true.
-Suppose the original operation was `FileSystem.create(myFile, 
overwrite=false)`.
-If this operation failed with `MetadataPersistenceException` a repeat of the
-same operation would result in `FileAlreadyExistsException` since the original
-operation successfully created the file in S3 and only failed in writing the
-metadata to S3Guard.
-
-Metadata update failures can be downgraded to ERROR logging instead of 
exception
-by setting the following configuration:
-
-```xml
-<property>
-    <name>fs.s3a.metadatastore.fail.on.write.error</name>
-    <value>false</value>
-</property>
-```
-
-Setting this false is dangerous as it could result in the type of issue S3Guard
-is designed to avoid. For example, a reader may see an inconsistent listing
-after a recent write since S3Guard may not contain metadata about the recently
-written file due to a metadata write error.
-
-As with the default setting, the new/updated file is still in S3 and **is not**
-rolled back. The S3Guard metadata is likely to be out of sync.
-
-### 3. Configure the Metadata Store.
-
-Here are the `DynamoDBMetadataStore` settings.  Other Metadata Store
-implementations will have their own configuration parameters.
-
-
-### 4. Name Your Table
-
-First, choose the name of the table you wish to use for the S3Guard metadata
-storage in your DynamoDB instance.  If you leave it unset/empty, a
-separate table will be created for each S3 bucket you access, and that
-bucket's name will be used for the name of the DynamoDB table.  For example,
-this sets the table name to `my-ddb-table-name`
-
-```xml
-<property>
-  <name>fs.s3a.s3guard.ddb.table</name>
-  <value>my-ddb-table-name</value>
-  <description>
-    The DynamoDB table name to operate. Without this property, the respective
-    S3 bucket names will be used.
-  </description>
-</property>
-```
-
-It is good to share a table across multiple buckets for multiple reasons,
-especially if you are *not* using on-demand DynamoDB tables, and instead
-prepaying for provisioned I/O capacity.
-
-1. You are billed for the provisioned I/O capacity allocated to the table,
-*even when the table is not used*. Sharing capacity can reduce costs.
-
-1. You can share the "provision burden" across the buckets. That is, rather
-than allocating for the peak load on a single bucket, you can allocate for
-the peak load *across all the buckets*, which is likely to be significantly
-lower.
-
-1. It's easier to measure and tune the load requirements and cost of
-S3Guard, because there is only one table to review and configure in the
-AWS management console.
-
-1. When you don't grant the permission to create DynamoDB tables to users.
-A single pre-created table for all buckets avoids the needs for an 
administrator
-to create one for every bucket.
-
-When wouldn't you want to share a table?
-
-1. When you are using on-demand DynamoDB and want to keep each table isolated.
-1. When you do explicitly want to provision I/O capacity to a specific bucket
-and table, isolated from others.
-
-1. When you are using separate billing for specific buckets allocated
-to specific projects.
-
-1. When different users/roles have different access rights to different 
buckets.
-As S3Guard requires all users to have R/W access to the table, all users will
-be able to list the metadata in all buckets, even those to which they lack
-read access.
-
-### 5. Locate your Table
-
-You may also wish to specify the region to use for DynamoDB.  If a region
-is not configured, S3A will assume that it is in the same region as the S3
-bucket. A list of regions for the DynamoDB service can be found in
-[Amazon's 
documentation](http://docs.aws.amazon.com/general/latest/gr/rande.html#ddb_region).
-In this example, to use the US West 2 region:
-
-```xml
-<property>
-  <name>fs.s3a.s3guard.ddb.region</name>
-  <value>us-west-2</value>
-</property>
-```
-
-When working with S3Guard-managed buckets from EC2 VMs running in AWS
-infrastructure, using a local DynamoDB region ensures the lowest latency
-and highest reliability, as well as avoiding all long-haul network charges.
-The S3Guard tables, and indeed, the S3 buckets, should all be in the same
-region as the VMs.
-
-### 6. Optional: Create your Table
-
-Next, you can choose whether or not the table will be automatically created
-(if it doesn't already exist).  If you want this feature, set the
-`fs.s3a.s3guard.ddb.table.create` option to `true`.
-
-```xml
-<property>
-  <name>fs.s3a.s3guard.ddb.table.create</name>
-  <value>true</value>
-  <description>
-    If true, the S3A client will create the table if it does not already exist.
-  </description>
-</property>
-```
-
-### 7. If creating a table: Choose your billing mode (and perhaps I/O Capacity)
-
-Next, you need to decide whether to use On-Demand DynamoDB and its
-pay-per-request billing (recommended), or to explicitly request a
-provisioned IO capacity.
-
-Before AWS offered pay-per-request billing, the sole billing mechanism,
-was "provisioned capacity". This mechanism requires you to choose 
-the DynamoDB read and write throughput requirements you
-expect to need for your expected uses of the S3Guard table.
-Setting higher values cost you more money -*even when the table was idle*
-  *Note* that these settings only affect table creation when
-`fs.s3a.s3guard.ddb.table.create` is enabled.  To change the throughput for
-an existing table, use the AWS console or CLI tool.
-
-For more details on DynamoDB capacity units, see the AWS page on [Capacity
-Unit 
Calculations](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/WorkingWithTables.html#CapacityUnitCalculations).
-
-Provisioned IO capacity is billed per hour for the life of the table, *even 
when the
-table and the underlying S3 buckets are not being used*.
-
-There are also charges incurred for data storage and for data I/O outside of 
the
-region of the DynamoDB instance. S3Guard only stores metadata in DynamoDB: 
path names
-and summary details of objects —the actual data is stored in S3, so billed at 
S3
-rates.
-
-With provisioned I/O capacity, attempting to perform more I/O than the capacity
-requested throttles the operation and may result in operations failing.
-Larger I/O capacities cost more.
-
-With the introduction of On-Demand DynamoDB, you can now avoid paying for
-provisioned capacity by creating an on-demand table.
-With an on-demand table you are not throttled if your DynamoDB requests exceed
-any pre-provisioned limit, nor do you pay per hour even when a table is idle.
-
-You do, however, pay more per DynamoDB operation.
-Even so, the ability to cope with sudden bursts of read or write requests, 
combined
-with the elimination of charges for idle tables, suit the use patterns made of 
-S3Guard tables by applications interacting with S3. That is: periods when the 
table
-is rarely used, with intermittent high-load operations when directory trees
-are scanned (query planning and similar), or updated (rename and delete 
operations).
-
-
-We recommending using On-Demand DynamoDB for maximum performance in operations
-such as query planning, and lowest cost when S3 buckets are not being accessed.
-
-This is the default, as configured in the default configuration options.
-
-```xml
-<property>
-  <name>fs.s3a.s3guard.ddb.table.capacity.read</name>
-  <value>0</value>
-  <description>
-    Provisioned throughput requirements for read operations in terms of 
capacity
-    units for the DynamoDB table. This config value will only be used when
-    creating a new DynamoDB table.
-    If set to 0 (the default), new tables are created with "per-request" 
capacity.
-    If a positive integer is provided for this and the write capacity, then
-    a table with "provisioned capacity" will be created.
-    You can change the capacity of an existing provisioned-capacity table
-    through the "s3guard set-capacity" command.
-  </description>
-</property>
-
-<property>
-  <name>fs.s3a.s3guard.ddb.table.capacity.write</name>
-  <value>0</value>
-  <description>
-    Provisioned throughput requirements for write operations in terms of
-    capacity units for the DynamoDB table.
-    If set to 0 (the default), new tables are created with "per-request" 
capacity.
-    Refer to related configuration option 
fs.s3a.s3guard.ddb.table.capacity.read
-  </description>
-</property>
-```
-
-### 8.  If creating a table: Enable server side encryption (SSE)
-
-Encryption at rest can help you protect sensitive data in your DynamoDB table.
-When creating a new table, you can set server side encryption on the table
-using the default AWS owned customer master key (CMK), AWS managed CMK, or
-customer managed CMK. S3Guard code accessing the table is all the same whether
-SSE is enabled or not. For more details on DynamoDB table server side
-encryption, see the AWS page on [Encryption at Rest: How It 
Works](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/encryption.howitworks.html).
-
-These are the default configuration options, as configured in 
`core-default.xml`.
-
-```xml
-<property>
-  <name>fs.s3a.s3guard.ddb.table.sse.enabled</name>
-  <value>false</value>
-  <description>
-    Whether server-side encryption (SSE) is enabled or disabled on the table.
-    By default it's disabled, meaning SSE is set to AWS owned CMK.
-  </description>
-</property>
-
-<property>
-  <name>fs.s3a.s3guard.ddb.table.sse.cmk</name>
-  <value/>
-  <description>
-    The KMS Customer Master Key (CMK) used for the KMS encryption on the table.
-    To specify a CMK, this config value can be its key ID, Amazon Resource Name
-    (ARN), alias name, or alias ARN. Users only need to provide this config if
-    the key is different from the default DynamoDB KMS Master Key, which is
-    alias/aws/dynamodb.
-  </description>
-</property>
-```
-
-## Authenticating with S3Guard
-
-The DynamoDB metadata store takes advantage of the fact that the DynamoDB
-service uses the same authentication mechanisms as S3. S3Guard
-gets all its credentials from the S3A client that is using it.
-
-All existing S3 authentication mechanisms can be used.
-
-## Per-bucket S3Guard configuration
-
-In production, it is likely only some buckets will have S3Guard enabled;
-those which are read-only may have disabled, for example. Equally importantly,
-buckets in different regions should have different tables, each
-in the relevant region.
-
-These options can be managed through S3A's [per-bucket configuration
-mechanism](./index.html#Configuring_different_S3_buckets).
-All options with the under `fs.s3a.bucket.BUCKETNAME.KEY` are propagated
-to the options `fs.s3a.KEY` *for that bucket only*.
-
-As an example, here is a configuration to use different metadata stores
-and tables for different buckets
-
-First, we define shortcuts for the metadata store classnames:
-
-
-```xml
-<property>
-  <name>s3guard.null</name>
-  <value>org.apache.hadoop.fs.s3a.s3guard.NullMetadataStore</value>
-</property>
-
-<property>
-  <name>s3guard.dynamo</name>
-  <value>org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore</value>
-</property>
-```
-
-Next, Amazon's public landsat database is configured with no
-metadata store:
-
-```xml
-<property>
-  <name>fs.s3a.bucket.landsat-pds.metadatastore.impl</name>
-  <value>${s3guard.null}</value>
-  <description>The read-only landsat-pds repository isn't
-  managed by S3Guard</description>
-</property>
-```
-
-Next the `ireland-2` and `ireland-offline` buckets are configured with
-DynamoDB as the store, and a shared table `production-table`:
-
-
-```xml
-<property>
-  <name>fs.s3a.bucket.ireland-2.metadatastore.impl</name>
-  <value>${s3guard.dynamo}</value>
-</property>
-
-<property>
-  <name>fs.s3a.bucket.ireland-offline.metadatastore.impl</name>
-  <value>${s3guard.dynamo}</value>
-</property>
-
-<property>
-  <name>fs.s3a.bucket.ireland-2.s3guard.ddb.table</name>
-  <value>production-table</value>
-</property>
-```
-
-The region of this table is automatically set to be that of the buckets,
-here `eu-west-1`; the same table name may actually be used in different
-regions.
-
-Together then, this configuration enables the DynamoDB Metadata Store
-for two buckets with a shared table, while disabling it for the public
-bucket.
-
-
-### Out-of-band operations with S3Guard
-
-We call an operation out-of-band (OOB) when a bucket is used by a client with
- S3Guard, and another client runs a write (e.g delete, move, rename,
- overwrite) operation on an object in the same bucket without S3Guard.
-
-The definition of behaviour in S3AFileSystem/MetadataStore in case of OOBs:
-* A client with S3Guard
-* B client without S3Guard (Directly to S3)
-
-
-* OOB OVERWRITE, authoritative mode:
-  * A client creates F1 file
-  * B client overwrites F1 file with F2 (Same, or different file size)
-  * A client's getFileStatus returns F1 metadata
-
-* OOB OVERWRITE, NOT authoritative mode:
-  * A client creates F1 file
-  * B client overwrites F1 file with F2 (Same, or different file size)
-  * A client's getFileStatus returns F2 metadata. In not authoritative mode we
- check S3 for the file. If the modification time of the file in S3 is greater
- than in S3Guard, we can safely return the S3 file metadata and update the
- cache.
-
-* OOB DELETE, authoritative mode:
-  * A client creates F file
-  * B client deletes F file
-  * A client's getFileStatus returns that the file is still there
-
-* OOB DELETE, NOT authoritative mode:
-  * A client creates F file
-  * B client deletes F file
-  * A client's getFileStatus returns that the file is still there
-
-Note: authoritative and NOT authoritative mode behaves the same at
-OOB DELETE case.
-
-The behaviour in case of getting directory listings:
-* File status in metadata store gets updated during the listing the same way
-as in getFileStatus.
-
 
 ## S3Guard Command Line Interface (CLI)
 
-Note that in some cases an AWS region or `s3a://` URI can be provided.
-
-Metadata store URIs include a scheme that designates the backing store. For
-example (e.g. `dynamodb://table_name`;). As documented above, the
-AWS region can be inferred if the URI to an existing bucket is provided.
-
-
-The S3A URI must also be provided for per-bucket configuration options
-to be picked up. That is: when an s3a URL is provided on the command line,
-all its "resolved" per-bucket settings are used to connect to, authenticate
-with and configure the S3Guard table. If no such URL is provided, then
-the base settings are picked up.
-
-
-### Create a table: `s3guard init`
-
-```bash
-hadoop s3guard init -meta URI ( -region REGION | s3a://BUCKET )
-```
-
-Creates and initializes an empty metadata store.
-
-A DynamoDB metadata store can be initialized with additional parameters
-pertaining to capacity. 
-
-If these values are both zero, then an on-demand DynamoDB table is created;
-if positive values then they set the
-[Provisioned 
Throughput](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ProvisionedThroughput.html)
-of the table.
-
-
-```bash
-[-write PROVISIONED_WRITES] [-read PROVISIONED_READS]
-```
-
-Server side encryption (SSE) can be enabled with AWS managed customer master 
key
-(CMK), or customer managed CMK. By default the DynamoDB table will be encrypted
-with AWS owned CMK. To use a customer managed CMK, you can specify its KMS key
-ID, ARN, alias name, or alias ARN. If not specified, the default AWS managed 
CMK
-for DynamoDB "alias/aws/dynamodb" will be used.
-
-```bash
-[-sse [-cmk KMS_CMK_ID]]
-```
-
-Tag argument can be added with a key=value list of tags. The table for the
-metadata store will be created with these tags in DynamoDB.
-
-```bash
-[-tag key=value;]
-```
-
-
-Example 1
-
-```bash
-hadoop s3guard init -meta dynamodb://ireland-team -write 0 -read 0 
s3a://ireland-1
-```
-
-Creates an on-demand table "ireland-team",
-in the same location as the S3 bucket "ireland-1".
-
-
-Example 2
-
-```bash
-hadoop s3guard init -meta dynamodb://ireland-team -region eu-west-1 --read 0 
--write 0
-```
-
-Creates a table "ireland-team" in the region "eu-west-1.amazonaws.com"
-
-
-Example 3
-
-```bash
-hadoop s3guard init -meta dynamodb://ireland-team -tag tag1=first;tag2=second;
-```
-
-Creates a table "ireland-team" with tags "first" and "second". The read and
-write capacity will be those of the site configuration's values of
-`fs.s3a.s3guard.ddb.table.capacity.read` and 
`fs.s3a.s3guard.ddb.table.capacity.write`;
-if these are both zero then it will be an on-demand table.
-
-
-Example 4
-
-```bash
-hadoop s3guard init -meta dynamodb://ireland-team -sse
-```
-
-Creates a table "ireland-team" with server side encryption enabled. The CMK 
will
-be using the default AWS managed "alias/aws/dynamodb".
-
-
-### Import a bucket: `s3guard import`
-
-```bash
-hadoop s3guard import [-meta URI] [-authoritative] [-verbose] s3a://PATH
-```
-
-Pre-populates a metadata store according to the current contents of an S3
-bucket/path. If the `-meta` option is omitted, the binding information is taken
-from the `core-site.xml` configuration.
-
-Usage
+The `h`
 
-```
-hadoop s3guard import
-
-import [OPTIONS] [s3a://PATH]
-    import metadata from existing S3 data
-
-Common options:
-  -authoritative - Mark imported directory data as authoritative.
-  -verbose - Verbose Output.
-  -meta URL - Metadata repository details (implementation-specific)
-
-Amazon DynamoDB-specific options:
-  -region REGION - Service region for connections
-
-  URLs for Amazon DynamoDB are of the form dynamodb://TABLE_NAME.
-  Specifying both the -region option and an S3A path
-  is not supported.
-```
-
-Example
-
-Import all files and directories in a bucket into the S3Guard table.
-
-```bash
-hadoop s3guard import s3a://ireland-1
-```
-
-Import a directory tree, marking directories as authoritative.
-
-```bash
-hadoop s3guard import -authoritative -verbose s3a://ireland-1/fork-0008
-
-2020-01-03 12:05:18,321 [main] INFO - Metadata store 
DynamoDBMetadataStore{region=eu-west-1,
- tableName=s3guard-metadata, 
tableArn=arn:aws:dynamodb:eu-west-1:980678866538:table/s3guard-metadata} is 
initialized.
-2020-01-03 12:05:18,324 [main] INFO - Starting: Importing 
s3a://ireland-1/fork-0008
-2020-01-03 12:05:18,324 [main] INFO - Importing directory 
s3a://ireland-1/fork-0008
-2020-01-03 12:05:18,537 [main] INFO - Dir  
s3a://ireland-1/fork-0008/test/doTestListFiles-0-0-0-false
-2020-01-03 12:05:18,630 [main] INFO - Dir  
s3a://ireland-1/fork-0008/test/doTestListFiles-0-0-0-true
-2020-01-03 12:05:19,142 [main] INFO - Dir  
s3a://ireland-1/fork-0008/test/doTestListFiles-2-0-0-false/dir-0
-2020-01-03 12:05:19,191 [main] INFO - Dir  
s3a://ireland-1/fork-0008/test/doTestListFiles-2-0-0-false/dir-1
-2020-01-03 12:05:19,240 [main] INFO - Dir  
s3a://ireland-1/fork-0008/test/doTestListFiles-2-0-0-true/dir-0
-2020-01-03 12:05:19,289 [main] INFO - Dir  
s3a://ireland-1/fork-0008/test/doTestListFiles-2-0-0-true/dir-1
-2020-01-03 12:05:19,314 [main] INFO - Updated S3Guard with 0 files and 6 
directory entries
-2020-01-03 12:05:19,315 [main] INFO - Marking directory tree 
s3a://ireland-1/fork-0008 as authoritative
-2020-01-03 12:05:19,342 [main] INFO - Importing s3a://ireland-1/fork-0008: 
duration 0:01.018s
-Inserted 6 items into Metadata Store
-```
-
-### Compare a S3Guard table and the S3 Store: `s3guard diff`
-
-```bash
-hadoop s3guard diff [-meta URI] s3a://BUCKET
-```
-
-Lists discrepancies between a metadata store and bucket. Note that depending on
-how S3Guard is used, certain discrepancies are to be expected.
-
-Example
-
-```bash
-hadoop s3guard diff s3a://ireland-1
 ```

Review comment:
       did a big review/cull of the doc again




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 677039)
    Time Spent: 1h 20m  (was: 1h 10m)

> Remove S3Guard - no longer needed
> ---------------------------------
>
>                 Key: HADOOP-17409
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17409
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>    Affects Versions: 3.3.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> With Consistent S3, S3Guard is superfluous. 
> stop developing it and wean people off it as soon as they can.
> Then we can worry about what to do in the code. It has gradually insinuated 
> its way through the layers, especially things like multi-object delete 
> handling (see HADOOP-17244). Things would be a lot simpler without it
> This work is being done in the feature branch HADOOP-17409-remove-s3guard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Work logged] (HADOOP-17409) Remove S3Guard - no longer needed

Reply via email to