[ 
https://issues.apache.org/jira/browse/SLING-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15455539#comment-15455539
 ] 

Ian Boston edited comment on SLING-6027 at 9/1/16 2:31 PM:
-----------------------------------------------------------

The current Chunked upload mechanism does not have a way of isolating different 
sessions performing different uploads. It assumes that only one client will be 
performing an upload to an asset at the same time, and returns a 500 error if 
any other client tries to start an upload. For any other client to recover from 
this it would have to abandon the current upload and start again which would 
fail the first client. Since this is the only recovery route, it will be used. 
Other systems including the AWS S3 Multipart upload specification referenced 
from the Sling specification use a session ID.

Note also, that the S3 Multipart upload specification aggregates the parts on 
completion under S3, presumably at a low level without consuming Disk IO.


was (Author: ianeboston):
The current Chunked upload mechanism does not have a way of isolating different 
sessions performing different uploads. It assumes that only one client will be 
performing an upload to an asset at the same time. If 2 are performing uploads 
there is a significant risk that the result will be corrupted. Take 2 clients 
each performing a chunked upload to a folder /content/reports both with the 
file name 2014_results.pdf, each with a different version. Both will create 
chunks inder 2014_results.pdf and when one completes the chunks will be merged 
resulting in a corrupted file. Other systems establish an upload session id 
before starting the upload, or on the first body part so that body parts from 
different sessions are not mixed up. Subsequent upload operations from the same 
client use the same session id.

> Support existing Chunked upload functionality in streaming mode.
> ----------------------------------------------------------------
>
>                 Key: SLING-6027
>                 URL: https://issues.apache.org/jira/browse/SLING-6027
>             Project: Sling
>          Issue Type: Bug
>          Components: Servlets
>    Affects Versions: Servlets Post 2.3.12
>            Reporter: Ian Boston
>
> The non streaming uploads support a  partial upload protocol implemented in 
> request parameters that is known in Sling terms as "Chunked" upload and 
> documented at 
> https://cwiki.apache.org/confluence/display/SLING/Chunked+File+Upload+Support 
>  (not to be confused with Chunked Transfer encoding or the use of Http Range 
> headers).
> Sling Chunked uploading sends a sequence of POSTs containing multiple parts 
> of a file upload. When all the parts are uploaded a final request is sent 
> that causes all the parts to be merged into a single file in the JCR. From a 
> streaming point of view, each part can be streamed with the streaming 
> implementation supported by SLING-5948. Some additional code will be required 
> to set the file name appropriately and the struture.
> However, when the upload is completed, Sling must merge all the parts. To 
> maintain the streaming nature of the upload, this must be achieved without 
> incurring any local IO, otherwise the benefits of a streamed upload are lost.
> I am not certain how to achieve the merge given the limitations of the JCR 
> API other than by transferring all the body parts via the local JVM. That 
> won't incur local Disk IO but will multiply the overall IO requirement by 3x.
> If JCR/Oak had the functionality to concatenate Binaries it could do this 
> more efficiently depending on the DS implementation. If JCR/Oak exposed an 
> Seekable OutputStream the Application could avoid needing to save uploads to 
> the JCR as individual files. If JCR/Oak allowed an update to a binary to 
> start at a known location, again this could be avoided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to