[ https://issues.apache.org/jira/browse/JCLOUDS-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17750372#comment-17750372 ]
Jan Vermeulen commented on JCLOUDS-1608: ---------------------------------------- I have run and debugged the MpuPartitioningAlgorithmTest, and that leads to the following conclusion: MultipartUploadSlicingAlgorithm has been conceived to calculate a division in x equal parts plus a remainder, where: * there is _always_ a remainder: if there is no remainder, the number of parts is reduced by one and a remainder is returned the size of a part. If the length is equal to or below the default partSize the number of parts is 0 and all is put into the remainder. * the maximumNumberOfParts only applies to the x equal parts and {_}disregards the remainder{_}: if you request a maximum of 100 parts, you get 99 equal parts and a remainder as long as the length does not exceed maxPartSize*maxNumberOfParts. But if it does, you get 100 equal parts of maxPartSize and a remainder with what did not fit. So that's 101 parts in total. The problem is (considering this logic) not with the algorithm itself, but with how it is used in combination with GCS, that only accepts 32 parts in total (including the remainder). The maximumNumberOfParts defined in GoogleCloudStorageBlobStore is 32, which does not guarantee a total of 32 in all cases. Problem is that this is not accounted for anywhere in the blobstore: if there is an overflow, that is logged by the algorithm {color:#000000} {color}{color:#0000c0}logger{color}{color:#000000}.debug({color}{color:#2a00ff}" %d bytes partitioned in %d parts of part size: %d, remaining: %d%s"{color}{color:#000000}, {color}{color:#6a3e3e}length{color}{color:#000000}, {color}{color:#6a3e3e}parts{color}{color:#000000}, {color}{color:#0000c0}chunkSize{color}{color:#000000},{color} {color:#000000} {color}{color:#0000c0}remaining{color}{color:#000000}, {color}{color:#0000c0}remaining{color}{color:#000000} > {color}{color:#0000c0}maximumPartSize{color}{color:#000000} ? {color}{color:#2a00ff}" overflow!"{color}{color:#000000} : {color}{color:#2a00ff}""{color}{color:#000000});{color} but apart from that nothing is done to prevent the blobstore from passing 33 parts to GCS causing a failure. Since the maxNumberOfParts is externally imposed, my patch handles that overflow inside the algorithm, trading in on the maxPartSize to comply with the imposed maxNumberOfParts. The following patched code accounts for the logic that there is always a remainder, and handles the case where there is an overflow: {color:#7f0055}long{color}{color:#000000} {color}{color:#6a3e3e}remainder{color}{color:#000000} = {color}{color:#6a3e3e}length{color}{color:#000000} % {color}{color:#6a3e3e}unitPartSize{color}{color:#000000};{color} {color:#000000} {color}{color:#7f0055}if{color}{color:#000000} ({color}{color:#6a3e3e}remainder{color}{color:#000000} == 0 && {color}{color:#6a3e3e}parts{color}{color:#000000} > 0) {{color} {color:#000000} {color}{color:#6a3e3e}parts{color}{color:#000000} -= 1;{color} {color:#000000} }{color} {color:#000000} {color} {color:#000000} {color}{color:#3f7f5f}// SHB patch{color} {color:#000000} {color}{color:#3f7f5f}// remainder should be distributed over parts if we are at the maximumNumberOfParts{color} {color:#000000} {color}{color:#3f7f5f}// (if not, an additional part is uploaded to GCS thus exceeding the maximum allowed parts){color} {color:#000000} {color}{color:#7f0055}if{color}{color:#000000} ({color}{color:#6a3e3e}remainder{color}{color:#000000} > 0 && {color}{color:#6a3e3e}parts{color}{color:#000000} == {color}{color:#0000c0}maximumNumberOfParts{color}{color:#000000}) {{color} {color:#000000} {color}{color:#6a3e3e}parts{color}{color:#000000} -= 1;{color} {color:#000000} {color}{color:#6a3e3e}partSize{color}{color:#000000} = {color}{color:#6a3e3e}length{color}{color:#000000}/{color}{color:#6a3e3e}parts{color}{color:#000000};{color} {color:#000000} }{color} {color:#000000} {color}{color:#3f7f5f}// end of SHB patch{color} Using this corrected patch, there is only one check in the unittest that fails, namely the test where the length exceeds maxPartSize*maxNumberOfParts. If it is acceptable that the algoritm handles the overflow in such a way, the unit-test would have to be adapted to match this new behavior. If it is not acceptable that the logic of the algorithm handles the overflow, another solution could be to redefine the maxNumberOfParts in GoogleCloudStorageBlobStore to 31, so it never exceeds 32 including the remainder. But that would imply that the overflow in the length would not be evenly redistributed over the parts. It's not for me to decide what is best here, I only suggest a solution for a problem that (without a patch) prevents us from using the BlobStore implementation to upload large files to Google Cloud. > Slicing of large files can lead to exceed the 32 parts limit of GCS > ------------------------------------------------------------------- > > Key: JCLOUDS-1608 > URL: https://issues.apache.org/jira/browse/JCLOUDS-1608 > Project: jclouds > Issue Type: Bug > Components: jclouds-blobstore > Affects Versions: 2.2.1, 2.5.0 > Reporter: Jan Vermeulen > Priority: Major > Labels: google-cloud-storage > Fix For: 2.2.1, 2.6.0 > > > MultipartUploadSlicingAlgorithm calculates slices for a large file by first > using the defaultPartSize. If the results of that slicing gives parts that > don't exceed the min/maxPartSizes (5MB and 5GB for > GoogleCloudStorageBlobStore) but that do exceed the maxNumberOfParts (32 for > GoogleCloudStorageBlobStore), the algoritm sets the number of parts to 32 and > recalculates the size of the parts. If there is any remainder after that, > JClouds ends up uploading 33 parts in total to GCS, causing the process to > fail in completeMultipartUpload() when recomposing the original content from > the parts. > The following simple unitTest proves the case: > {{public class AlgoritmTest extends TestCase {}} > {{ public void testSlicing() {}} > {{ MultipartUploadSlicingAlgorithm algorithm = new > MultipartUploadSlicingAlgorithm(1024*1024*5,1024*1024*1024*5,32);}} > {{ algorithm.calculateChunkSize(1024*1024*1200+33);}} > {{ assertTrue(algorithm.getParts()+((algorithm.getRemaining() > > 0)?(1):(0)) <= 32);}} > {{}}} > It simulates the slicing of a file of 1.2GB+33 bytes (to make sure there is a > remainder). > The following patch fixes the issue: > {{ ...}} > {{ long remainder = length % unitPartSize; }} > {{ // SHB patch}} > {{ // remainder should be distributed over parts if we are at the > maximumNumberOfParts}} > {{ // (if not, an additional part is uploaded to GCS thus exceeding the > maximum allowed parts)}} > {{ // if (remainder == 0 && parts > 0) {}} > {{ // parts -= 1;}} > {{ if (remainder > 0 && parts == maximumNumberOfParts) {}} > {{ parts -= 1;}} > {{ partSize = length/parts;}} > {{// end of SHB patch}} > {{ ...}} > I also commented the code that reduces the number of parts when there is no > remainder, since that ends up creating a remaining part that is the same size > as the others. -- This message was sent by Atlassian Jira (v8.20.10#820010)