[
https://issues.apache.org/jira/browse/HDFS-13186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532066#comment-16532066
]
Aaron Fabbri commented on HDFS-13186:
-------------------------------------
Hey folks. Catching up on stuff after vacation. Took a quick look at this.
Couple comments:
# Thanks for keeping the API backend-agnostic (good layering)
# What is the motivation for this? Even if not part of FileSystem it is more
surface area we need to deal with.
Looks like you have the basic idea of how to update S3Guard. Think of the
S3Guard MetadataStore as a "trailing log of metadata changes made to the
underlying bucket".
> [PROVIDED Phase 2] Multipart Uploader API
> -----------------------------------------
>
> Key: HDFS-13186
> URL: https://issues.apache.org/jira/browse/HDFS-13186
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Ewan Higgs
> Assignee: Ewan Higgs
> Priority: Major
> Fix For: 3.2.0
>
> Attachments: HDFS-13186.001.patch, HDFS-13186.002.patch,
> HDFS-13186.003.patch, HDFS-13186.004.patch, HDFS-13186.005.patch,
> HDFS-13186.006.patch, HDFS-13186.007.patch, HDFS-13186.008.patch,
> HDFS-13186.009.patch, HDFS-13186.010.patch
>
>
> To write files in parallel to an external storage system as in HDFS-12090,
> there are two approaches:
> # Naive approach: use a single datanode per file that copies blocks locally
> as it streams data to the external service. This requires a copy for each
> block inside the HDFS system and then a copy for the block to be sent to the
> external system.
> # Better approach: Single point (e.g. Namenode or SPS style external client)
> and Datanodes coordinate in a multipart - multinode upload.
> This system needs to work with multiple back ends and needs to coordinate
> across the network. So we propose an API that resembles the following:
> {code:java}
> public UploadHandle multipartInit(Path filePath) throws IOException;
> public PartHandle multipartPutPart(InputStream inputStream,
> int partNumber, UploadHandle uploadId) throws IOException;
> public void multipartComplete(Path filePath,
> List<Pair<Integer, PartHandle>> handles,
> UploadHandle multipartUploadId) throws IOException;{code}
> Here, UploadHandle and PartHandle are opaque handlers in the vein of
> PathHandle so they can be serialized and deserialized in hadoop-hdfs project
> without knowledge of how to deserialize e.g. S3A's version of a UpoadHandle
> and PartHandle.
> In an object store such as S3A, the implementation is straight forward. In
> the case of writing multipart/multinode to HDFS, we can write each block as a
> file part. The complete call will perform a concat on the blocks.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]