Re: Computation of content length for input streams

Alexander Tsvetkov Thu, 24 Oct 2019 23:50:40 -0700

Tracked as https://issues.apache.org/jira/browse/JCLOUDS-1521


On 2019/10/23 13:34:13, Alexander Tsvetkov <[email protected]> 
wrote: 
> Hi,
> 
> I have a REST API that allows upload of potentially large files (up to 4GBs). 
> Due to the size, I cannot load these files in-memory, as that could quickly 
> crash my application. I also don't want to store them as temporary files, 
> since that could fill up my disk if a lot of people decide to upload at the 
> same time.
> 
> Instead, I want to process the incoming files as InputStreams and forward 
> them to the S3 object store. I understand that this is not possible directly, 
> since S3 requires the content length to be known before the upload. However, 
> I saw on StackOverflow 
> (https://stackoverflow.com/questions/8653146/can-i-stream-a-file-upload-to-s3-without-a-content-length-header)
>  that it's possible to workaround this problem by reading the InputStream in 
> memory in chunks of 5 (or more) MBs and uploading these chunks via the S3 
> multipart upload API. As a result, I assume that I'll be able to upload a 4GB 
> file, by having no more than 5 MBs of its content stored in-memory at any 
> given time.
> 
> I tried to do so with JClouds (version 2.1.1), but I've hit a problem. I have 
> the following code:
> Blob blob = blobStore.blobBuilder(name)
>     .payload(inputStream)
>     ...
>     .build();
> blobStore.putBlob(container, blob, PutOptions.Builder.multipart());
> 
> If I run it like this, I get a NullPointerException, because I didn't specify 
> the content's length:
> java.lang.NullPointerException: while trying to invoke the method 
> java.lang.Long.longValue() of a null object returned from 
> org.jclouds.io.MutableContentMetadata.getContentLength()
>    at 
> org.jclouds.blobstore.internal.BaseBlobStore.putMultipartBlob(BaseBlobStore.java:356)
>    at 
> org.jclouds.blobstore.internal.BaseBlobStore.putMultipartBlob(BaseBlobStore.java:347)
>    at 
> org.jclouds.aws.s3.blobstore.AWSS3BlobStore.putBlob(AWSS3BlobStore.java:79)
> 
> I think it would be possible for JClouds to compute the size of the 
> InputStream dynamically:
>   1. Slice the stream into chunks of X MBs and store the chunks in-memory 
> (where X has a default value but is also configurable).
>   2. Upload the chunks sequentially - the content length header can be set to 
> X MBs.
>   3. Finalize the multipart upload.
> 
> That way, no more than X MBs will be stored in memory for any given upload.
> 
> Would you accept a pull request for this?
> 
> Best regards,
> Alexander
>

Re: Computation of content length for input streams

Reply via email to