[jira] [Comment Edited] (JCR-4335) API for direct binary access

Matt Ryan (JIRA) Fri, 20 Jul 2018 10:41:08 -0700


    [ 
https://issues.apache.org/jira/browse/JCR-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550887#comment-16550887
 ]


Matt Ryan edited comment on JCR-4335 at 7/20/18 5:40 PM:
---------------------------------------------------------

{quote} - do we really need to parametrize sizes and number of parts? I 
understand that the implementation doing the upload needs this, but why does it 
appear in the API?{quote}
I think they are necessary.  There are a few reasons for stating the number of 
parts, but they mostly center on the potential impact of a very large list of 
URIs, for example to a resulting JSON document.

Assume a JavaScript browser client is interacting with a web endpoint that, in 
turn, is invoking this API.  The JavaScript client wants to upload a binary 
directly, so it is requesting instructions on how to do that from the web 
endpoint.  The web endpoint would then call this API and obtain a 
{{BinaryUpload}} object that it then converts into a JSON document to return to 
the JavaScript client.  The JavaScript client or the web endpoint may have 
limitations on the size of the JSON document that it can support.

IIRC, Azure allows up to 10,000 upload parts in a multi-part upload.  S3 is 
even higher at 50,000.  In my testing, I've seen signed URIs over 500 
characters long.  If a client were unable to specify the number of parts, a 
list of 10,000 upload URIs of over 500 characters each would exceed 5MB in a 
JSON document just for the list of URIs itself.  This may or may not be a 
problem; only the client would know whether accepting a document that large is 
problematic.

The expected size of the upload is also needed for similar reasons, based on 
what the service provider capabilities are.  Some service providers require 
multi-part uploads for binaries above a certain size.  Some do not allow 
multi-part uploads of binaries smaller than a certain size.  Both Azure and S3 
have limits as to the maximum size of a binary that can be uploaded.

If the implementation knows the expected upload size and the number of parts 
the client can accept, then it can determine whether it is possible to perform 
this upload directly or whether the client will need to try to upload it 
through the repository as has been done traditionally.  For example if the 
client wants to upload a 300MB binary but does not support multi-part 
uploading, if the service provider requires multi-part uploading above 250MB 
then this upload request will fail so the client cannot upload this binary 
directly to storage.  However, the Oak backend may be able to handle this 
upload without problems so it could be uploaded the traditional way.


was (Author: mattvryan):
{quote} - do we really need to parametrize sizes and number of parts? I 
understand that the implementation doing the upload needs this, but why does it 
appear in the API?{quote}
I think they are necessary.  There are a few reasons for stating the number of 
parts, but they mostly center on the impact to a resulting JSON document, for 
example.

Assume a JavaScript browser client is interacting with a web endpoint that, in 
turn, is invoking this API.  The JavaScript client wants to upload a binary 
directly, so it is requesting instructions on how to do that from the web 
endpoint.  The web endpoint would then call this API and obtain a 
{{BinaryUpload}} object that it then converts into a JSON document to return to 
the JavaScript client.  The JavaScript client or the web endpoint may have 
limitations on the size of the JSON document that it can support.

IIRC, Azure allows up to 10,000 upload parts in a multi-part upload.  S3 is 
even higher at 50,000.  In my testing, I've seen signed URIs over 500 
characters long.  If a client were unable to specify the number of parts, a 
list of 10,000 upload URIs of over 500 characters each would exceed 5MB in a 
JSON document just for the list of URIs itself.  This may or may not be a 
problem; only the client would know whether accepting a document that large is 
problematic.

The expected size of the upload is also needed for similar reasons, based on 
what the service provider capabilities are.  Some service providers require 
multi-part uploads for binaries above a certain size.  Some do not allow 
multi-part uploads of binaries smaller than a certain size.  Both Azure and S3 
have limits as to the maximum size of a binary that can be uploaded.

If the implementation knows the expected upload size and the number of parts 
the client can accept, then it can determine whether it is possible to perform 
this upload directly or whether the client will need to try to upload it 
through the repository as has been done traditionally.  For example if the 
client wants to upload a 300MB binary but does not support multi-part 
uploading, if the service provider requires multi-part uploading above 250MB 
then this upload request will fail so the client cannot upload this binary 
directly to storage.  However, the Oak backend may be able to handle this 
upload without problems so it could be uploaded the traditional way.

> API for direct binary access
> ----------------------------
>
>                 Key: JCR-4335
>                 URL: https://issues.apache.org/jira/browse/JCR-4335
>             Project: Jackrabbit Content Repository
>          Issue Type: New Feature
>          Components: jackrabbit-api
>            Reporter: Marcel Reutegger
>            Assignee: Marcel Reutegger
>            Priority: Major
>         Attachments: JCR-4335.patch, JCR-4335.patch
>
>
> Jackrabbit Oak proposes to add a new direct binary access capability to the 
> repository. One part of the proposal is to expose this new capability in the 
> Jackrabbit API. For details see OAK-7569 and OAK-7589.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (JCR-4335) API for direct binary access

Reply via email to