[
https://issues.apache.org/jira/browse/HADOOP-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Loughran reassigned HADOOP-16456:
---------------------------------------
Assignee: Steve Loughran
> Refactor the S3A codebase into a more maintainable and testable form
> --------------------------------------------------------------------
>
> Key: HADOOP-16456
> URL: https://issues.apache.org/jira/browse/HADOOP-16456
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs/s3
> Affects Versions: 3.3.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Major
>
> The S3A Codebase has got too complex to be maintained. In particular,
> * the lack of layering in the S3AFileSystem class means that all
> subcomponents (delegation, dynamo db, block outputstream etc) all get given a
> back reference and make arbitrary calls in to it.
> * We can't test in isolation, and while integration tests are the most
> rigorous testing we can have, they are slow, hard to inject failures into and
> do not work on isolated parts of code
> * The code within the S3A FileSystem calls the toplevel API calls internally,
> so mixing public interface with the implementation details
> * We are adding context through S3Guard calls for: consistency, performance
> and recovery; we can't do that without a clean split between that public API
> and the internals
> Proposed:
> # we carefully break up the S3AFileSystem into a layered design
> # with a "StoreContext" to bind components of the connector to it
> # and some form of operation context to be passed in with each request to
> represent the active operation and its state (including that for S3Guard
> BulkOperations)
> See [refactoring
> S3A|https://github.com/steveloughran/engineering-proposals/blob/master/refactoring-s3a.md]
> I've already started using some of this design in the HADOOP-15183 component,
> for the addition of those S3Guard bulk operations, and to add a medium-life
> "RenameOperation". The proposal document reviews that experience and
> discusses improvements.
> As noted: this needs to be done with care. We still need to maintain the
> existing codebase; the more radically we change the code not only do we
> increase the risk of the changes being wrong, we make backporting that much
> harder. But we can't sustain the current design
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]