Aaron Fabbri created HADOOP-15780:
-------------------------------------
Summary: S3Guard: document how to deal with non-S3Guard processes
writing data to S3Guarded buckets
Key: HADOOP-15780
URL: https://issues.apache.org/jira/browse/HADOOP-15780
Project: Hadoop Common
Issue Type: Sub-task
Affects Versions: 3.2.0
Reporter: Aaron Fabbri
Our general policy for S3Guard is this: All modifiers of a bucket that is
configured for use with S3Guard, must use S3Guard. Otherwise, the MetadataStore
will not be properly updated as the S3 bucket changes and problems will arise.
There are limited circumstances in which may be safe to have an external
(non-s3guard) process writing data. There are also scenarios where it
definitely breaks things.
I think we should start by documenting the cases that this works / does not
work for. After we've enumerated that, we can suggest enhancements as needed to
make this sort of configuration easier to use.
To get the ball rolling, some things that do not work:
- Deleting a path *p* with S3Guard, then writing a new file at path *p* without
S3guard (will still have delete marker in S3Guard, making the file appear to be
deleted but still visible in S3 due to false "eventual consistency") (as
[[email protected]] and I have discussed)
- When fs.s3a.metadatastore.authoritative is true, adding files to directories
without S3Guard, then listing with S3Guard may exclude externally-written files
from listings.
(Note, there are also S3A interop issues with other non-S3A clients even
without S3Guard, due to the unique way S3A interprets empty directory markers).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]