[ 
https://issues.apache.org/jira/browse/HADOOP-13447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415482#comment-15415482
 ] 

Chris Nauroth edited comment on HADOOP-13447 at 8/10/16 3:57 PM:
-----------------------------------------------------------------

I'm attaching patch v001 to demonstrate what I have in mind.  The test code 
refactoring in HADOOP-13446 is a pre-requisite for this patch.

There are at least 2 more things I want to do with this patch before it's ready:

# I want to write a true unit test that mocks S3 client interactions, to prove 
that the patch does in fact set us up to be able to mock the S3 calls 
effectively (and therefore simulate eventual consistency).
# I have introduced a test failure in 
{{ITestS3AFileContextStatistics#testStatistics}}.  Root cause is that handling 
of {{FileSystem.Statistics}} through {{DelegateToFileSystem}} is a bit funky in 
terms of scope/lifetime of that stats instance.  I haven't found the best fix 
yet though.  All other existing tests are passing.

Here is a summary of changes broken down by significant classes:
* {{S3AFileSystem}}: This is now a much smaller class.  It will be responsible 
for initializing an {{S3Store}}, which encapsulates the S3 calls, and a 
concrete subclass of {{AbstractS3AccessPolicy}}, which will control how client 
calls coordinate with S3 and optionally other external metadata repositories.
* {{S3ClientFactory}}: This is a factory for construction of the S3 client 
instance.  Note that its return type is defined as {{AmazonS3}} (an interface 
from the AWS SDK), not {{AmazonS3Client}} (the concrete implementation that 
issues HTTP calls to the S3 back-end).  This is the indirection that will allow 
us to mock the S3 calls.  Tests will be able to configure a different factory 
to return a mock client.  The default implementation is 
{{DefaultS3ClientFactory}}, and all pre-existing configuration logic related to 
the S3 client has moved here.
* {{S3Store}}: Much of the existing code of {{S3AFileSystem}} has moved here.  
This class encapsulates how client calls translate to S3 calls.  This layer 
uses {{Configuration}} to lookup the desired {{S3ClientFactory}} implementation.
* {{AbstractS3AccessPolicy}} / {{DirectS3AccessPolicy}}: Policy classes define 
how client calls coordinate between S3 calls (the {{S3Store}}) and optionally 
other external metadata repositories.  Currently, the only concrete 
implementation just delegates directly to S3, which provides the same semantics 
as the existing S3A codebase.  The scope of the various "implement access 
policy" sub-tasks is to implement other sub-classes that provide different 
semantics: caching, cross-validation for strong consistency, etc.


was (Author: cnauroth):
I'm attach patch v001 to demonstrate what I have in mind.  The test code 
refactoring in HADOOP-13446 is a pre-requisite for this patch.

There are at least 2 more things I want to do with this patch before it's ready:

# I want to write a true unit test that mocks S3 client interactions, to prove 
that the patch does in fact set us up to be able to mock the S3 calls 
effectively (and therefore simulate eventual consistency).
# I have introduced a test failure in 
{{ITestS3AFileContextStatistics#testStatistics}}.  Root cause is that handling 
of {{FileSystem.Statistics}} through {{DelegateToFileSystem}} is a bit funky in 
terms of scope/lifetime of that stats instance.  I haven't found the best fix 
yet though.  All other existing tests are passing.

Here is a summary of changes broken down by significant classes:
* {{S3AFileSystem}}: This is now a much smaller class.  It will be responsible 
for initializing an {{S3Store}}, which encapsulates the S3 calls, and a 
concrete subclass of {{AbstractS3AccessPolicy}}, which will control how client 
calls coordinate with S3 and optionally other external metadata repositories.
* {{S3ClientFactory}}: This is a factory for construction of the S3 client 
instance.  Note that its return type is defined as {{AmazonS3}} (an interface 
from the AWS SDK), not {{AmazonS3Client}} (the concrete implementation that 
issues HTTP calls to the S3 back-end).  This is the indirection that will allow 
us to mock the S3 calls.  Tests will be able to configure a different factory 
to return a mock client.  The default implementation is 
{{DefaultS3ClientFactory}}, and all pre-existing configuration logic related to 
the S3 client has moved here.
* {{S3Store}}: Much of the existing code of {{S3AFileSystem}} has moved here.  
This class encapsulates how client calls translate to S3 calls.  This layer 
uses {{Configuration}} to lookup the desired {{S3ClientFactory}} implementation.
* {{AbstractS3AccessPolicy}} / {{DirectS3AccessPolicy}}: Policy classes define 
how client calls coordinate between S3 calls (the {{S3Store}}) and optionally 
other external metadata repositories.  Currently, the only concrete 
implementation just delegates directly to S3, which provides the same semantics 
as the existing S3A codebase.  The scope of the various "implement access 
policy" sub-tasks is to implement other sub-classes that provide different 
semantics: caching, cross-validation for strong consistency, etc.

> S3Guard: Refactor S3AFileSystem to support introduction of separate metadata 
> repository and tests.
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-13447
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13447
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>         Attachments: HADOOP-13447-HADOOP-13446.001.patch
>
>
> The scope of this issue is to refactor the existing {{S3AFileSystem}} into 
> multiple coordinating classes.  The goal of this refactoring is to separate 
> the {{FileSystem}} API binding from the AWS SDK integration, make code 
> maintenance easier while we're making changes for S3Guard, and make it easier 
> to mock some implementation details so that tests can simulate eventual 
> consistency behavior in a deterministic way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to