[ 
https://issues.apache.org/jira/browse/BEAM-9078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017481#comment-17017481
 ] 

Boyuan Zhang commented on BEAM-9078:
------------------------------------

Thanks Brad! unfortunately, I don't thin k the issue will be marked as resolved 
automatically. If you think this issue has been addressed, please close it 
manually.
Thanks for your contribution!

> Large Tarball Artifacts Should Use GCS Resumable Upload
> -------------------------------------------------------
>
>                 Key: BEAM-9078
>                 URL: https://issues.apache.org/jira/browse/BEAM-9078
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>    Affects Versions: 2.17.0
>            Reporter: Brad West
>            Assignee: Brad West
>            Priority: Major
>             Fix For: 2.19.0
>
>   Original Estimate: 1h
>          Time Spent: 40m
>  Remaining Estimate: 20m
>
> It's possible for the tarball uploaded to GCS to be quite large. An example 
> is a user vendoring multiple dependencies in their tarball so as to achieve a 
> more stable deployable artifact.
> Before this change the GCS upload api call executed a multipart upload, which 
> Google 
> [documentation]([https://cloud.google.com/storage/docs/json_api/v1/how-tos/upload)]
>  states should be used when the file is small enough to upload again when the 
> connection fails. For large tarballs, we will hit 60 second socket timeouts 
> before completing the multipart upload. By passing `total_size`, apitools 
> first checks if the size exceeds the resumable upload threshold, and executes 
> the more robust resumable upload rather than a multipart, avoiding
>  socket timeouts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to