[
https://issues.apache.org/jira/browse/JCR-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550887#comment-16550887
]
Matt Ryan edited comment on JCR-4335 at 7/20/18 5:40 PM:
---------------------------------------------------------
{quote} - do we really need to parametrize sizes and number of parts? I
understand that the implementation doing the upload needs this, but why does it
appear in the API?{quote}
I think they are necessary. There are a few reasons for stating the number of
parts, but they mostly center on the potential impact of a very large list of
URIs, for example to a resulting JSON document.
Assume a JavaScript browser client is interacting with a web endpoint that, in
turn, is invoking this API. The JavaScript client wants to upload a binary
directly, so it is requesting instructions on how to do that from the web
endpoint. The web endpoint would then call this API and obtain a
{{BinaryUpload}} object that it then converts into a JSON document to return to
the JavaScript client. The JavaScript client or the web endpoint may have
limitations on the size of the JSON document that it can support.
IIRC, Azure allows up to 10,000 upload parts in a multi-part upload. S3 is
even higher at 50,000. In my testing, I've seen signed URIs over 500
characters long. If a client were unable to specify the number of parts, a
list of 10,000 upload URIs of over 500 characters each would exceed 5MB in a
JSON document just for the list of URIs itself. This may or may not be a
problem; only the client would know whether accepting a document that large is
problematic.
The expected size of the upload is also needed for similar reasons, based on
what the service provider capabilities are. Some service providers require
multi-part uploads for binaries above a certain size. Some do not allow
multi-part uploads of binaries smaller than a certain size. Both Azure and S3
have limits as to the maximum size of a binary that can be uploaded.
If the implementation knows the expected upload size and the number of parts
the client can accept, then it can determine whether it is possible to perform
this upload directly or whether the client will need to try to upload it
through the repository as has been done traditionally. For example if the
client wants to upload a 300MB binary but does not support multi-part
uploading, if the service provider requires multi-part uploading above 250MB
then this upload request will fail so the client cannot upload this binary
directly to storage. However, the Oak backend may be able to handle this
upload without problems so it could be uploaded the traditional way.
was (Author: mattvryan):
{quote} - do we really need to parametrize sizes and number of parts? I
understand that the implementation doing the upload needs this, but why does it
appear in the API?{quote}
I think they are necessary. There are a few reasons for stating the number of
parts, but they mostly center on the impact to a resulting JSON document, for
example.
Assume a JavaScript browser client is interacting with a web endpoint that, in
turn, is invoking this API. The JavaScript client wants to upload a binary
directly, so it is requesting instructions on how to do that from the web
endpoint. The web endpoint would then call this API and obtain a
{{BinaryUpload}} object that it then converts into a JSON document to return to
the JavaScript client. The JavaScript client or the web endpoint may have
limitations on the size of the JSON document that it can support.
IIRC, Azure allows up to 10,000 upload parts in a multi-part upload. S3 is
even higher at 50,000. In my testing, I've seen signed URIs over 500
characters long. If a client were unable to specify the number of parts, a
list of 10,000 upload URIs of over 500 characters each would exceed 5MB in a
JSON document just for the list of URIs itself. This may or may not be a
problem; only the client would know whether accepting a document that large is
problematic.
The expected size of the upload is also needed for similar reasons, based on
what the service provider capabilities are. Some service providers require
multi-part uploads for binaries above a certain size. Some do not allow
multi-part uploads of binaries smaller than a certain size. Both Azure and S3
have limits as to the maximum size of a binary that can be uploaded.
If the implementation knows the expected upload size and the number of parts
the client can accept, then it can determine whether it is possible to perform
this upload directly or whether the client will need to try to upload it
through the repository as has been done traditionally. For example if the
client wants to upload a 300MB binary but does not support multi-part
uploading, if the service provider requires multi-part uploading above 250MB
then this upload request will fail so the client cannot upload this binary
directly to storage. However, the Oak backend may be able to handle this
upload without problems so it could be uploaded the traditional way.
> API for direct binary access
> ----------------------------
>
> Key: JCR-4335
> URL: https://issues.apache.org/jira/browse/JCR-4335
> Project: Jackrabbit Content Repository
> Issue Type: New Feature
> Components: jackrabbit-api
> Reporter: Marcel Reutegger
> Assignee: Marcel Reutegger
> Priority: Major
> Attachments: JCR-4335.patch, JCR-4335.patch
>
>
> Jackrabbit Oak proposes to add a new direct binary access capability to the
> repository. One part of the proposal is to expose this new capability in the
> Jackrabbit API. For details see OAK-7569 and OAK-7589.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)