[
https://issues.apache.org/jira/browse/HADOOP-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15394565#comment-15394565
]
Aaron Fabbri commented on HADOOP-13345:
---------------------------------------
Thanks [~cnauroth]. Really cool to compare two independently designed
solutions. Thanks for the GCS link, I'll check that out.
I agree we should proceed and collaborate on this. Feature branch sounds good.
{quote}
The main difference I see is that my work focused more on consistency, with the
S3 bucket still treated as source of truth, and your work focused more on
performance. I hadn't tried anything with the DynamoDB lookup completely
short-circuiting the S3 lookup. I think we can reconcile this though.
{quote}
We tried to make the {{MetadataStore}} interface expressive enough to allow
implementations (both the MetadataStore impl. and the client code that uses it)
to decide on whether or not the {{MetadataStore}} can be source of truth on
directory listings:
- Our {{MetadataStore#listStatus(Path)}} returns a {{CachedDirectory}} which
contains a flag {{isFullyCached}}. Implementations may always set that flag
to false, indicating that the client needs to consult the backing storage as
well.
- If a client connector wishes to take advantage of the performance benefits,
it can publish full directory listings to the {{MetadataStore}} via
{{putListStatus()}} with {{isFullyCached=true}}, and also note the
{{isFullyCached}} flags on the return values from {{listStatus()}}. If a
client connector does not want to deal with two possible sources of truth (e.g.
to simplify failure cases), it can chose not to publish full listings to the
{{MetadataStore}}, and to ignore any {{isFullyCached}} flags that are set on
return from {{MetadataStore#listStatus()}}.
> S3Guard: Improved Consistency for S3A
> -------------------------------------
>
> Key: HADOOP-13345
> URL: https://issues.apache.org/jira/browse/HADOOP-13345
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs/s3
> Reporter: Chris Nauroth
> Assignee: Chris Nauroth
> Attachments: HADOOP-13345.prototype1.patch,
> S3C-ConsistentListingonS3-Design.pdf, S3GuardImprovedConsistencyforS3A.pdf,
> s3c.001.patch
>
>
> This issue proposes S3Guard, a new feature of S3A, to provide an option for a
> stronger consistency model than what is currently offered. The solution
> coordinates with a strongly consistent external store to resolve
> inconsistencies caused by the S3 eventual consistency model.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]