[jira] [Commented] (HDFS-13186) [PROVIDED Phase 2] Multipart Multinode uploader API + Implementations

Ewan Higgs (JIRA) Fri, 27 Apr 2018 07:29:22 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-13186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16456519#comment-16456519
 ]


Ewan Higgs commented on HDFS-13186:
-----------------------------------

[~chris.douglas], thanks for the feedback.
{quote}The current impl doesn't define a default for {{FileSystem}} 
implementations, which could be a serial copy. Instead, it throws an exception. 
Utilities (like {{FsShell}} or YARN) need to implement some boilerplate for 
both paths, rather than using a single path that falls back to a serial upload.
{quote}
MPU is distinct from serial copy since we can upload the data out of order. I 
don't see the use case for a version that breaks this. In this case, I think 
the boiler plate is correct: try to make an MPU; if we can't then fall back to 
{{FileSystem::write}}.
{quote}Some implementations might benefit from an explicit 
{{MultipartUploader::abort}}, which may clean up the partial upload. Clearly it 
can't be guaranteed, but we'd like the property that an {{UploadHandle}} 
persisted to a WAL could be used for cleanup.
{quote}
Absolutely.
{quote}The {{PartHandle}} could retain its ID, rather than providing a 
{{Pair<Integer,PartHandle>}} to {{commit}}. This might make repartitioning 
difficult i.e., splitting a slow {{PartHandle}}, but implementations could 
implement custom handling if that's important. It would be sufficient for the 
{{PartHandle}} to be {{Comparable}}, though equality should be treated either 
as a duplicate or an error at {{complete}} by the {{MultipartUploader}}.
{quote}
This is an interesting idea. So the numbering of the parts would be internal to 
the part handle when downcasting in the different types. This means we could no 
longer rely on the fact that it's just a String and would then have to 
introduce all the protobuf machinery. That's possible; just wanted to highlight 
it before I go away and implement that. (e.g. hadoop-aws project currently 
doesn't rely directly on any protobuf code generation).
{quote}Do {{putPart}} and {{complete}} need the {{filePath}} parameter?
{quote}
They are required for S3A at the very least. But they could be encoded in the 
UploadHandle.
{quote}Does the {{UploadHandle}} init ever vary, depending on the src? Intra-FS 
copies?
{quote}
No. It's all based on the target (it only ever sees the target FS).
{quote}Right now, the utility doesn't offer an API to partition a file, or to 
create (bounded) {{InputStream}} args to {{putPart}}.
{quote}
I figured the caller can use FileInputStream and call skip.
{quote} * Do {{putPart}} and {{complete}} need the {{filePath}} 
parameter?{quote}
Yes. It's required by the S3 api for one. And the filePath needs to follow the 
upload around because various nodes that are calling put part will need to know 
what type of MultipartUploader to make.

 
{quote}Is this to support {{Serializable}}}?
{quote}
No. If we need to serialize anything, I think we should have it be explicitly 
done through protobuf. (schema; schema evolution, etc). As it happens, this is 
just a String being encoded so we don't need to formally specify it as protobuf 
just yet. (see above).

 

> [PROVIDED Phase 2] Multipart Multinode uploader API + Implementations
> ---------------------------------------------------------------------
>
>                 Key: HDFS-13186
>                 URL: https://issues.apache.org/jira/browse/HDFS-13186
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Ewan Higgs
>            Assignee: Ewan Higgs
>            Priority: Major
>         Attachments: HDFS-13186.001.patch, HDFS-13186.002.patch, 
> HDFS-13186.003.patch
>
>
> To write files in parallel to an external storage system as in HDFS-12090, 
> there are two approaches:
>  # Naive approach: use a single datanode per file that copies blocks locally 
> as it streams data to the external service. This requires a copy for each 
> block inside the HDFS system and then a copy for the block to be sent to the 
> external system.
>  # Better approach: Single point (e.g. Namenode or SPS style external client) 
> and Datanodes coordinate in a multipart - multinode upload.
> This system needs to work with multiple back ends and needs to coordinate 
> across the network. So we propose an API that resembles the following:
> {code:java}
> public UploadHandle multipartInit(Path filePath) throws IOException;
> public PartHandle multipartPutPart(InputStream inputStream,
>     int partNumber, UploadHandle uploadId) throws IOException;
> public void multipartComplete(Path filePath,
>     List<Pair<Integer, PartHandle>> handles, 
>     UploadHandle multipartUploadId) throws IOException;{code}
> Here, UploadHandle and PartHandle are opaque handlers in the vein of 
> PathHandle so they can be serialized and deserialized in hadoop-hdfs project 
> without knowledge of how to deserialize e.g. S3A's version of a UpoadHandle 
> and PartHandle.
> In an object store such as S3A, the implementation is straight forward. In 
> the case of writing multipart/multinode to HDFS, we can write each block as a 
> file part. The complete call will perform a concat on the blocks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-13186) [PROVIDED Phase 2] Multipart Multinode uploader API + Implementations

Reply via email to