[
https://issues.apache.org/jira/browse/HDFS-12090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16378955#comment-16378955
]
Rakesh R commented on HDFS-12090:
---------------------------------
[~ehiggs] Sorry for the delay in my response. I thought of replying after
finishing the major code changes to support *external SPS*, still two more
sub-tasks to go HDFS-13165, HDFS-13166. I'm trying to push these asap.
{quote}Agreed. The SPS underwent considerable changes in the past few months so
it would not have been possible to track it.
{quote}
Yes, I understand the pain to learn branch code multiple times. Actually, we
had received review comments to support *external SPS* quite lately and this
caused lot of refactoring efforts in past few months. Now, the {{Context}}
interface is clearly defined to separate out both internal and external SPS.
Please let me know any help to understand sps code flow. I believe major code
refactoring is completed and committed to the branch. Presently, [~daryn] has
offered help to take another round of reviews and we are waiting to get few
clarifications from him.
{quote}Are there hooks in the tracker for multinode-multipart uploads to
perform the init and complete?
{quote}
No, there is no hook provided. Probably, you can add it for provided writes.
{quote}The two main interfaces we need to work with are scanAndCollectFileIds
and submitMoveTask. These are very generic so we could potentially implement
them
{quote}
One of the review comment by him is to use existing {{datatransfer}} rather
than introducing new command for {{submitMoveTask}}. I've replied to that by
putting sps requirements. I think, separate command is very much required to do
multinode-multipart uploads for provided store writes. How about adding your
use case in HDFS-10285 jira so that we could discuss together then define basic
move command and later it can be extended to support provided store writes.
{quote}I'm not sure if this would mean the Context API would need to change or
if it can be done with the existing APIs since ExternalSPSFileIDCollector has
access to a DistributedFileSystem which can perform the snapshotting and
provide diffs.
{quote}
SPS is not based on snapshot diff, it just do file block scanning and schedule
the block movements(src-target pairs). It is possible to extend
_ExternalSPSFileIDCollector_ interface and do necessary modifications to
support your case. Anyway, Context and other supporting interfaces are private
and open to make changes internally.
> Handling writes from HDFS to Provided storages
> ----------------------------------------------
>
> Key: HDFS-12090
> URL: https://issues.apache.org/jira/browse/HDFS-12090
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Virajith Jalaparti
> Priority: Major
> Attachments: HDFS-12090-Functional-Specification.001.pdf,
> HDFS-12090-Functional-Specification.002.pdf,
> HDFS-12090-Functional-Specification.003.pdf, HDFS-12090-design.001.pdf,
> HDFS-12090.0000.patch, HDFS-12090.0001.patch
>
>
> HDFS-9806 introduces the concept of {{PROVIDED}} storage, which makes data in
> external storage systems accessible through HDFS. However, HDFS-9806 is
> limited to data being read through HDFS. This JIRA will deal with how data
> can be written to such {{PROVIDED}} storages from HDFS.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]