[ 
https://issues.apache.org/jira/browse/HADOOP-15999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16719468#comment-16719468
 ] 

Sean Mackrory edited comment on HADOOP-15999 at 12/12/18 9:35 PM:
------------------------------------------------------------------

{quote}We might just want to make that configurable (separate config knob 
probably). If we are in "check both MS and S3" mode, we probably want a 
configurable or pluggable conflict policy.{quote}

Yeah - I also considered addressing the out-of-band deletes problem with a 
config (or 2) that governs whether we create and / or honor tombstones. But 
that's adding exposed complexity and isn't very elegant. If we can relatively 
easily just start comparing modification times, then we can fix all these use 
cases and offer 2 basic modes:

- S3Guard with authoritative mode, in which the MetadataStore is the source of 
truth and we can assume All The Things.
- S3Guard without authoritative mode, in which S3 is the source of truth. We 
will always be at least as up to date as S3 appears, and will fix list 
consistency as long as S3 doesn't give us evidence to the contrary (i.e. older 
modification times or the lack of an update entirely).

I feel very uncomfortable with the idea of some middle ground where S3Guard 
can't be the source of truth, but we're still trying to be in some cases. It 
either has all the context or it doesn't, and if it doesn't we're trading in 
correctness for some performance, which I think is the wrong trade-off.


was (Author: mackrorysd):
{quote}We might just want to make that configurable (separate config knob 
probably). If we are in "check both MS and S3" mode, we probably want a 
configurable or pluggable conflict policy.{quote}

Yeah - I also considered addressing the out-of-band deletes problem with a 
config (or 2) that governs whether we create and / or honor tombstones. But 
that's adding exposed complexity and isn't very elegant. If we can relative 
easily just start comparing modification times, then we can offer 2 basic modes:

- S3Guard with authoritative mode, in which the MetadataStore is the source of 
truth and we can assume All The Things.
- S3Guard without authoritative mode, in which S3 is the source of truth. We 
will always be at least as up to date as S3 appears, and will fix list 
consistency as long as S3 doesn't give us evidence to the contrary (i.e. older 
modification times or the lack of an update entirely).

> [s3a] Better support for out-of-band operations
> -----------------------------------------------
>
>                 Key: HADOOP-15999
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15999
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.1.0
>            Reporter: Sean Mackrory
>            Assignee: Gabor Bota
>            Priority: Major
>         Attachments: out-of-band-operations.patch
>
>
> S3Guard was initially done on the premise that a new MetadataStore would be 
> the source of truth, and that it wouldn't provide guarantees if updates were 
> done without using S3Guard.
> I've been seeing increased demand for better support for scenarios where 
> operations are done on the data that can't reasonably be done with S3Guard 
> involved. For example:
> * A file is deleted using S3Guard, and replaced by some other tool. S3Guard 
> can't tell the difference between the new file and delete / list 
> inconsistency and continues to treat the file as deleted.
> * An S3Guard-ed file is overwritten by a longer file by some other tool. When 
> reading the file, only the length of the original file is read.
> We could possibly have smarter behavior here by querying both S3 and the 
> MetadataStore (even in cases where we may currently only query the 
> MetadataStore in getFileStatus) and use whichever one has the higher modified 
> time.
> This kills the performance boost we currently get in some workloads with the 
> short-circuited getFileStatus, but we could keep it with authoritative mode 
> which should give a larger performance boost. At least we'd get more 
> correctness without authoritative mode and a clear declaration of when we can 
> make the assumptions required to short-circuit the process. If we can't 
> consider S3Guard the source of truth, we need to defer to S3 more.
> We'd need to be extra sure of any locality / time zone issues if we start 
> relying on mod_time more directly, but currently we're tracking the 
> modification time as returned by S3 anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to