[ 
https://issues.apache.org/jira/browse/BEAM-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16163529#comment-16163529
 ] 

Reuven Lax commented on BEAM-2858:
----------------------------------

I just reproduced this and verified it does not cause data loss. The load job 
fails (on qeuery) with a 409. The message is

Error encountered during job execution:
Not found: URI 
gs://bigquery_beam_testing_regional/temp/BigQueryWriteTemp/c7fb6a3d06fa4ceab662f83488cc6d31/c5db57f8-9cc0-4cad-9a7e-9c56cb572177

However this is still a critical bug. Streaming jobs get blocked forever, 
because the job fails on every retry. Batch jobs will retry this several times 
and then fail.

> temp file garbage collection in BigQuery sink should be in a separate DoFn
> --------------------------------------------------------------------------
>
>                 Key: BEAM-2858
>                 URL: https://issues.apache.org/jira/browse/BEAM-2858
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-gcp
>    Affects Versions: 2.1.0
>            Reporter: Reuven Lax
>            Assignee: Reuven Lax
>             Fix For: 2.2.0
>
>         Attachments: delete_file_diff.txt
>
>
> Currently the WriteTables transform deletes the set of input files as soon as 
> the load() job completes. However this is incorrect - if the task fails 
> partially through deleting files (e.g. if the worker crashes), the task will 
> be retried. If the write disposition is WRITE_TRUNCATE, bad things could 
> result.
> The resulting behavior will depend on what BQ does if one of input files is 
> missing (because we had previously deleted it). In the best case, BQ will 
> fail the load. In this case the step will keep failing until the runner 
> finally fails the entire job. If however BQ ignores the missing file, the 
> load will overwrite the previously-written table with the smaller set of 
> files and the job will succeed. This is the worst-case scenario, as it will 
> result in data loss.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to