[ 
https://issues.apache.org/jira/browse/HADOOP-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reassigned HADOOP-16456:
---------------------------------------

    Assignee: Steve Loughran

> Refactor the S3A codebase into a more maintainable and testable form
> --------------------------------------------------------------------
>
>                 Key: HADOOP-16456
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16456
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>    Affects Versions: 3.3.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>
> The S3A Codebase has got too complex to be maintained. In particular,
> * the lack of layering in the S3AFileSystem class means that all 
> subcomponents (delegation, dynamo db, block outputstream etc) all get given a 
> back reference and make arbitrary calls in to it.
> * We can't test in isolation, and while integration tests are the most 
> rigorous testing we can have, they are slow, hard to inject failures into and 
> do not work on isolated parts of code
> * The code within the S3A FileSystem calls the toplevel API calls internally, 
> so mixing public interface with the implementation details
> * We are adding context through S3Guard calls for: consistency, performance 
> and recovery; we can't do that without a clean split between that public API 
> and the internals
> Proposed: 
> # we carefully break up the S3AFileSystem into a layered design
> # with a "StoreContext" to bind components of the connector to it
> # and some form of operation context to be passed in with each request to 
> represent the active operation and its state (including that for S3Guard 
> BulkOperations)
> See [refactoring 
> S3A|https://github.com/steveloughran/engineering-proposals/blob/master/refactoring-s3a.md]
> I've already started using some of this design in the HADOOP-15183 component, 
> for the addition of those S3Guard bulk operations, and to add a medium-life 
> "RenameOperation". The proposal document reviews that experience and 
> discusses improvements.
> As noted: this needs to be done with care. We still need to maintain the 
> existing codebase; the more radically we change the code not only do we 
> increase the risk of the changes being wrong, we make backporting that much 
> harder. But we can't sustain the current design



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to