[
https://issues.apache.org/jira/browse/HDFS-13713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16651665#comment-16651665
]
Ewan Higgs commented on HDFS-13713:
-----------------------------------
HDFS-13713.008.patch (using HDFS prefix, not HADOOP prefix even though this
also concerns S3AFilesystem)
008
- Allow concurrent uploads for Local file system and HDFS.
- Reorder uploads in the concurrent case.
- finalization methods (complete, abort) are not idempotent on HDFS (upload IDs
are consumed). But they are briefly on S3 while there is a server side GC that
will reap the upload IDs at a later time). Added an implementation dependent
boolean to determine which behaviour is expected with repeated completes and
aborts using an already burned uploadid.
{quote}We could maybe be vague about what happens, i.e. {quote}
We may need to be leave this open because S3 has a behaviour that is not
consistent with HDFS and it's not obvious that we would prefer one over the
other. Let me explain:
1. In the contract tests it becomes obvious that inS3 the last-started
successful upload is 'the winner'.
example: Given upload1 and upload2:
init 1
init 2 <-- last started upload
putpart 1
putpart 2
complete 2 <-- last started upload is complete - 'the winner'
complete 1 <-- never to be seen unless versioning is enabled
2. In HDFS the last completed upload is 'the winner'.
Example: given upload1 and upload2:
init 1
init 2
putpart 1
putpart 2
complete 2 <-- concat and copy into place - visible until complete1
complete 1 <-- concat and copy into place - 'the winner'
3. I don't know what WASB or GCS do so specifying based on S3 behaviour at this
time could be undesirable.
> Add specification of Multipart Upload API to FS specification, with contract
> tests
> ----------------------------------------------------------------------------------
>
> Key: HDFS-13713
> URL: https://issues.apache.org/jira/browse/HDFS-13713
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: fs, test
> Affects Versions: 3.2.0
> Reporter: Steve Loughran
> Assignee: Ewan Higgs
> Priority: Blocker
> Attachments: HADOOP-13713-004.patch, HADOOP-13713-004.patch,
> HADOOP-13713-005.patch, HADOOP-13713-006.patch, HADOOP-13713-007.patch,
> HDFS-13713.001.patch, HDFS-13713.002.patch, HDFS-13713.003.patch,
> HDFS-13713.008.patch, multipartuploader.md
>
>
> There's nothing in the FS spec covering the new API. Add it in a new .md file
> * add FS model with the notion of a function mapping (uploadID -> Upload),
> the operations (list, commit, abort). The [TLA+
> mode|https://issues.apache.org/jira/secure/attachment/12865161/objectstore.pdf]l
> of HADOOP-13786 shows how to do this.
> * Contract tests of not just the successful path, but all the invalid ones.
> * implementations of the contract tests of all FSs which support the new API.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]