[ 
https://issues.apache.org/jira/browse/BEAM-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157354#comment-16157354
 ] 

Chamikara Jayalath commented on BEAM-2858:
------------------------------------------

I tried intentionally deleting one of the files before running a load job.

Dataflow job: 2017-09-07_10_47_31-9640870329732038724
BigQuery load job: 
f0953eede88e454bb3b01d2cdba3c10d_1a2e1a938a117f8e3ff2af5ee4f58b45_00001_00000-0

This Dataflow job succeeded. So looks like this issue will manifest as a 
dataloss.

Reuven, feel free to grab this issue if you are hoping to produce a fix 
otherwise I'll look into this early next week.

> temp file garbage collection in BigQuery sink should be in a separate DoFn
> --------------------------------------------------------------------------
>
>                 Key: BEAM-2858
>                 URL: https://issues.apache.org/jira/browse/BEAM-2858
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-gcp
>    Affects Versions: 2.1.0
>            Reporter: Reuven Lax
>            Assignee: Chamikara Jayalath
>             Fix For: 2.2.0
>
>
> Currently the WriteTables transform deletes the set of input files as soon as 
> the load() job completes. However this is incorrect - if the task fails 
> partially through deleting files (e.g. if the worker crashes), the task will 
> be retried. If the write disposition is WRITE_TRUNCATE, bad things could 
> result.
> The resulting behavior will depend on what BQ does if one of input files is 
> missing (because we had previously deleted it). In the best case, BQ will 
> fail the load. In this case the step will keep failing until the runner 
> finally fails the entire job. If however BQ ignores the missing file, the 
> load will overwrite the previously-written table with the smaller set of 
> files and the job will succeed. This is the worst-case scenario, as it will 
> result in data loss.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to