[
https://issues.apache.org/jira/browse/BEAM-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157856#comment-16157856
]
Reuven Lax commented on BEAM-2858:
----------------------------------
I asked the BigQuery team, and they said the load job should fail. How did you
delete one of the files? These are the temp files generated from within the
Beam job.
> temp file garbage collection in BigQuery sink should be in a separate DoFn
> --------------------------------------------------------------------------
>
> Key: BEAM-2858
> URL: https://issues.apache.org/jira/browse/BEAM-2858
> Project: Beam
> Issue Type: Bug
> Components: sdk-java-gcp
> Affects Versions: 2.1.0
> Reporter: Reuven Lax
> Assignee: Chamikara Jayalath
> Fix For: 2.2.0
>
>
> Currently the WriteTables transform deletes the set of input files as soon as
> the load() job completes. However this is incorrect - if the task fails
> partially through deleting files (e.g. if the worker crashes), the task will
> be retried. If the write disposition is WRITE_TRUNCATE, bad things could
> result.
> The resulting behavior will depend on what BQ does if one of input files is
> missing (because we had previously deleted it). In the best case, BQ will
> fail the load. In this case the step will keep failing until the runner
> finally fails the entire job. If however BQ ignores the missing file, the
> load will overwrite the previously-written table with the smaller set of
> files and the job will succeed. This is the worst-case scenario, as it will
> result in data loss.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)