Tracked as https://issues.apache.org/jira/browse/JCLOUDS-1521
On 2019/10/23 13:34:13, Alexander Tsvetkov <[email protected]> wrote: > Hi, > > I have a REST API that allows upload of potentially large files (up to 4GBs). > Due to the size, I cannot load these files in-memory, as that could quickly > crash my application. I also don't want to store them as temporary files, > since that could fill up my disk if a lot of people decide to upload at the > same time. > > Instead, I want to process the incoming files as InputStreams and forward > them to the S3 object store. I understand that this is not possible directly, > since S3 requires the content length to be known before the upload. However, > I saw on StackOverflow > (https://stackoverflow.com/questions/8653146/can-i-stream-a-file-upload-to-s3-without-a-content-length-header) > that it's possible to workaround this problem by reading the InputStream in > memory in chunks of 5 (or more) MBs and uploading these chunks via the S3 > multipart upload API. As a result, I assume that I'll be able to upload a 4GB > file, by having no more than 5 MBs of its content stored in-memory at any > given time. > > I tried to do so with JClouds (version 2.1.1), but I've hit a problem. I have > the following code: > Blob blob = blobStore.blobBuilder(name) > .payload(inputStream) > ... > .build(); > blobStore.putBlob(container, blob, PutOptions.Builder.multipart()); > > If I run it like this, I get a NullPointerException, because I didn't specify > the content's length: > java.lang.NullPointerException: while trying to invoke the method > java.lang.Long.longValue() of a null object returned from > org.jclouds.io.MutableContentMetadata.getContentLength() > at > org.jclouds.blobstore.internal.BaseBlobStore.putMultipartBlob(BaseBlobStore.java:356) > at > org.jclouds.blobstore.internal.BaseBlobStore.putMultipartBlob(BaseBlobStore.java:347) > at > org.jclouds.aws.s3.blobstore.AWSS3BlobStore.putBlob(AWSS3BlobStore.java:79) > > I think it would be possible for JClouds to compute the size of the > InputStream dynamically: > 1. Slice the stream into chunks of X MBs and store the chunks in-memory > (where X has a default value but is also configurable). > 2. Upload the chunks sequentially - the content length header can be set to > X MBs. > 3. Finalize the multipart upload. > > That way, no more than X MBs will be stored in memory for any given upload. > > Would you accept a pull request for this? > > Best regards, > Alexander >
