[ 
https://issues.apache.org/jira/browse/BEAM-12139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17549285#comment-17549285
 ] 

Danny McCormick commented on BEAM-12139:
----------------------------------------

This issue has been migrated to https://github.com/apache/beam/issues/20891

> Suspected data loss (and/or duplicates) bug in BigQueyrServicesImpl
> -------------------------------------------------------------------
>
>                 Key: BEAM-12139
>                 URL: https://issues.apache.org/jira/browse/BEAM-12139
>             Project: Beam
>          Issue Type: Test
>          Components: io-java-gcp
>            Reporter: Alex Amato
>            Priority: P3
>
> When this API yields errors specific to failed inserts for a row.
> Rows are selected [here for 
> retrying|https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L967],
>  using the errorIndex which is returned from the error.
> retryRows.add(rowsToPublish.get(errorIndex));
> However, this errorIndex is not valid to index rowsToPublish. So it looks 
> like the wrong rows are being selected to be retried.
> *Why can't you use errorIndex to index rowsToPublish?*
> because rowsToPublish contains all of the rows which were passed into 
> insertAll.
> These are then batched into a smaller list of 
> ["rows"|https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L875]
>  , where multpile API calls are made to bigquery to insert the rows. 
> The errors returned actually refer to the list of rows passed into the call 
> made to BigQuery, so they are only valid indices for "rows". Thus, they are 
> not valid indices for "rowsToPublish".
> Note: These lists have a different number of rows: rowsToPublish.size() > 
> rows.size()



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to