[
https://issues.apache.org/jira/browse/BEAM-11494?focusedWorklogId=542337&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542337
]
ASF GitHub Bot logged work on BEAM-11494:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 26/Jan/21 19:00
Start Date: 26/Jan/21 19:00
Worklog Time Spent: 10m
Work Description: pabloem commented on pull request #13558:
URL: https://github.com/apache/beam/pull/13558#issuecomment-767757877
Hi Reuven - I ran some experiments on various GCS buckets. Here's what I
found
The experiments were done like so:
- A file of 1.6 MB was copied with `rewrite`
- A total of 1MB was copied on every call to `rewrite` (this is configurable
in the call)
- Source bucket: My bucket, without any special policies.
**Experiment 1** - Normal, successful workflow
- Destination bucket: Bucket in different region. No retention policy.
- After first `rewrite` call
- Call is successful
- Response contains a `rewriteToken`
- `done: False` in response
- File does **not** appear in the bucket when performing `gsutil ls
gs://bucket/file`
- After second `rewrite` call
- Call is successful
- `done: True` in response
- File **does** appear in the bucket when performing `gsutil ls
gs://bucket/file`
**Experiment 2** - Call has to be retried
- Destination bucket: Bucket in different region. No retention policy.
- After first `rewrite` call
- Call is successful
- Response contains a `rewriteToken`
- `done: False` in response
- File does **not** appear in the bucket when performing `gsutil ls
gs://bucket/file`
- After running first `rewrite` call again
- Call is successful
- Response contains a **new `rewriteToken`**
- `done: False` in response
- File does **not** appear in the bucket when performing `gsutil ls
gs://bucket/file`
- After second `rewrite` call
- Call is successful
- `done: True` in response
- File **does** appear in the bucket when performing `gsutil ls
gs://bucket/file`
**Experiment 3** - Call has to be retried - bucket has retention policy
- Destination bucket: Bucket in different region. Bucket has retention
policy.
- After first `rewrite` call
- Call is successful
- Response contains a `rewriteToken`
- `done: False` in response
- File does **not** appear in the bucket when performing `gsutil ls
gs://bucket/file`
- After running first `rewrite` call again
- Call is successful
- Response contains a **new `rewriteToken`**
- `done: False` in response
- File does **not** appear in the bucket when performing `gsutil ls
gs://bucket/file`
- After second `rewrite` call
- Call is successful
- `done: True` in response
- File **does** appear in the bucket when performing `gsutil ls
gs://bucket/file`
- Trying to delete file
- Call fails. Unable to delete file.
- Trying to overwrite file with new `rewrite` call
- Call fails. Unable to delete file.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 542337)
Time Spent: 20m (was: 10m)
> FileIO.Write overwrites destination files on retries
> ----------------------------------------------------
>
> Key: BEAM-11494
> URL: https://issues.apache.org/jira/browse/BEAM-11494
> Project: Beam
> Issue Type: Improvement
> Components: io-java-files
> Reporter: Pablo Estrada
> Assignee: Pablo Estrada
> Priority: P2
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Users have reported cases of FileIO.Write becoming stuck or failing due to
> overwriting destination files.
> The failure/stuckness occurs because there are some file system buckets with
> strict retention policies that do not allow files to be deleted.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)