[
https://issues.apache.org/jira/browse/AIRFLOW-2222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16446655#comment-16446655
]
ASF subversion and git services commented on AIRFLOW-2222:
----------------------------------------------------------
Commit f520990fe0b7a70f80bec68cb5c3f0d41e3e984d in incubator-airflow's branch
refs/heads/master from [~b11c]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=f520990 ]
[AIRFLOW-2326][AIRFLOW-2222] remove contrib.gcs_copy_operator
Closes #3232 from berislavlopac/AIRFLOW-2326
> GoogleCloudStorageHook.copy fails for large files between locations
> -------------------------------------------------------------------
>
> Key: AIRFLOW-2222
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2222
> Project: Apache Airflow
> Issue Type: Bug
> Reporter: Berislav Lopac
> Assignee: Berislav Lopac
> Priority: Major
> Fix For: 2.0.0
>
>
> When copying large files (confirmed for around 3GB) between buckets in
> different projects, the operation fails and the Google API returns error
> [413—Payload Too
> Large|https://cloud.google.com/storage/docs/json_api/v1/status-codes#413_Payload_Too_Large].
> The documentation for the error says:
> {quote}The Cloud Storage JSON API supports up to 5 TB objects.
> This error may, alternatively, arise if copying objects between locations
> and/or storage classes can not complete within 30 seconds. In this case, use
> the
> [Rewrite|https://cloud.google.com/storage/docs/json_api/v1/objects/rewrite]
> method instead.{quote}
> The reason seems to be that the {{GoogleCloudStorageHook.copy}} is using the
> API {{copy}} method.
> h3. Proposed Solution
> There are two potential solutions:
> # Implement {{GoogleCloudStorageHook.rewrite}} method which can be called
> from operators and other objects to ensure successful execution. This method
> is more flexible but requires changes both in the {{GoogleCloudStorageHook}}
> class and any other classes that use it for copying files to ensure that they
> explicitly call {{rewrite}} when needed.
> # Modify {{GoogleCloudStorageHook.copy}} to determine when to use {{rewrite}}
> instead of {{copy}} underneath. This requires updating only the
> {{GoogleCloudStorageHook}} class, but the logic might not cover all the edge
> cases and could be difficult to implement.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)