[ 
https://issues.apache.org/jira/browse/BEAM-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reuven Lax closed BEAM-2858.
----------------------------
    Resolution: Fixed

> temp file garbage collection in BigQuery sink should be in a separate DoFn
> --------------------------------------------------------------------------
>
>                 Key: BEAM-2858
>                 URL: https://issues.apache.org/jira/browse/BEAM-2858
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-gcp
>    Affects Versions: 2.1.0
>            Reporter: Reuven Lax
>            Assignee: Reuven Lax
>             Fix For: 2.2.0
>
>         Attachments: delete_file_diff.txt
>
>
> Currently the WriteTables transform deletes the set of input files as soon as 
> the load() job completes. However this is incorrect - if the task fails 
> partially through deleting files (e.g. if the worker crashes), the task will 
> be retried. If the write disposition is WRITE_TRUNCATE, bad things could 
> result.
> The resulting behavior will depend on what BQ does if one of input files is 
> missing (because we had previously deleted it). In the best case, BQ will 
> fail the load. In this case the step will keep failing until the runner 
> finally fails the entire job. If however BQ ignores the missing file, the 
> load will overwrite the previously-written table with the smaller set of 
> files and the job will succeed. This is the worst-case scenario, as it will 
> result in data loss.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to