[
https://issues.apache.org/jira/browse/HADOOP-13447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415482#comment-15415482
]
Chris Nauroth edited comment on HADOOP-13447 at 8/10/16 3:57 PM:
-----------------------------------------------------------------
I'm attaching patch v001 to demonstrate what I have in mind. The test code
refactoring in HADOOP-13446 is a pre-requisite for this patch.
There are at least 2 more things I want to do with this patch before it's ready:
# I want to write a true unit test that mocks S3 client interactions, to prove
that the patch does in fact set us up to be able to mock the S3 calls
effectively (and therefore simulate eventual consistency).
# I have introduced a test failure in
{{ITestS3AFileContextStatistics#testStatistics}}. Root cause is that handling
of {{FileSystem.Statistics}} through {{DelegateToFileSystem}} is a bit funky in
terms of scope/lifetime of that stats instance. I haven't found the best fix
yet though. All other existing tests are passing.
Here is a summary of changes broken down by significant classes:
* {{S3AFileSystem}}: This is now a much smaller class. It will be responsible
for initializing an {{S3Store}}, which encapsulates the S3 calls, and a
concrete subclass of {{AbstractS3AccessPolicy}}, which will control how client
calls coordinate with S3 and optionally other external metadata repositories.
* {{S3ClientFactory}}: This is a factory for construction of the S3 client
instance. Note that its return type is defined as {{AmazonS3}} (an interface
from the AWS SDK), not {{AmazonS3Client}} (the concrete implementation that
issues HTTP calls to the S3 back-end). This is the indirection that will allow
us to mock the S3 calls. Tests will be able to configure a different factory
to return a mock client. The default implementation is
{{DefaultS3ClientFactory}}, and all pre-existing configuration logic related to
the S3 client has moved here.
* {{S3Store}}: Much of the existing code of {{S3AFileSystem}} has moved here.
This class encapsulates how client calls translate to S3 calls. This layer
uses {{Configuration}} to lookup the desired {{S3ClientFactory}} implementation.
* {{AbstractS3AccessPolicy}} / {{DirectS3AccessPolicy}}: Policy classes define
how client calls coordinate between S3 calls (the {{S3Store}}) and optionally
other external metadata repositories. Currently, the only concrete
implementation just delegates directly to S3, which provides the same semantics
as the existing S3A codebase. The scope of the various "implement access
policy" sub-tasks is to implement other sub-classes that provide different
semantics: caching, cross-validation for strong consistency, etc.
was (Author: cnauroth):
I'm attach patch v001 to demonstrate what I have in mind. The test code
refactoring in HADOOP-13446 is a pre-requisite for this patch.
There are at least 2 more things I want to do with this patch before it's ready:
# I want to write a true unit test that mocks S3 client interactions, to prove
that the patch does in fact set us up to be able to mock the S3 calls
effectively (and therefore simulate eventual consistency).
# I have introduced a test failure in
{{ITestS3AFileContextStatistics#testStatistics}}. Root cause is that handling
of {{FileSystem.Statistics}} through {{DelegateToFileSystem}} is a bit funky in
terms of scope/lifetime of that stats instance. I haven't found the best fix
yet though. All other existing tests are passing.
Here is a summary of changes broken down by significant classes:
* {{S3AFileSystem}}: This is now a much smaller class. It will be responsible
for initializing an {{S3Store}}, which encapsulates the S3 calls, and a
concrete subclass of {{AbstractS3AccessPolicy}}, which will control how client
calls coordinate with S3 and optionally other external metadata repositories.
* {{S3ClientFactory}}: This is a factory for construction of the S3 client
instance. Note that its return type is defined as {{AmazonS3}} (an interface
from the AWS SDK), not {{AmazonS3Client}} (the concrete implementation that
issues HTTP calls to the S3 back-end). This is the indirection that will allow
us to mock the S3 calls. Tests will be able to configure a different factory
to return a mock client. The default implementation is
{{DefaultS3ClientFactory}}, and all pre-existing configuration logic related to
the S3 client has moved here.
* {{S3Store}}: Much of the existing code of {{S3AFileSystem}} has moved here.
This class encapsulates how client calls translate to S3 calls. This layer
uses {{Configuration}} to lookup the desired {{S3ClientFactory}} implementation.
* {{AbstractS3AccessPolicy}} / {{DirectS3AccessPolicy}}: Policy classes define
how client calls coordinate between S3 calls (the {{S3Store}}) and optionally
other external metadata repositories. Currently, the only concrete
implementation just delegates directly to S3, which provides the same semantics
as the existing S3A codebase. The scope of the various "implement access
policy" sub-tasks is to implement other sub-classes that provide different
semantics: caching, cross-validation for strong consistency, etc.
> S3Guard: Refactor S3AFileSystem to support introduction of separate metadata
> repository and tests.
> --------------------------------------------------------------------------------------------------
>
> Key: HADOOP-13447
> URL: https://issues.apache.org/jira/browse/HADOOP-13447
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Reporter: Chris Nauroth
> Assignee: Chris Nauroth
> Attachments: HADOOP-13447-HADOOP-13446.001.patch
>
>
> The scope of this issue is to refactor the existing {{S3AFileSystem}} into
> multiple coordinating classes. The goal of this refactoring is to separate
> the {{FileSystem}} API binding from the AWS SDK integration, make code
> maintenance easier while we're making changes for S3Guard, and make it easier
> to mock some implementation details so that tests can simulate eventual
> consistency behavior in a deterministic way.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]