[ https://issues.apache.org/jira/browse/BEAM-8367?focusedWorklogId=329268&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-329268 ]
ASF GitHub Bot logged work on BEAM-8367: ---------------------------------------- Author: ASF GitHub Bot Created on: 16/Oct/19 17:11 Start Date: 16/Oct/19 17:11 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #9797: [BEAM-8367] Using insertId for BQ streaming inserts URL: https://github.com/apache/beam/pull/9797#issuecomment-542801037 LGTM. Thanks! ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 329268) Time Spent: 1h 40m (was: 1.5h) > Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS > --------------------------------------------------------------------- > > Key: BEAM-8367 > URL: https://issues.apache.org/jira/browse/BEAM-8367 > Project: Beam > Issue Type: Bug > Components: sdk-py-core > Reporter: Chamikara Madhusanka Jayalath > Assignee: Pablo Estrada > Priority: Blocker > Fix For: 2.17.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Unique IDs ensure (best effort) that writes to BigQuery are idempotent, for > example, we don't write the same record twice in a VM failure. > > Currently Python BQ sink insert BQ IDs here but they'll be re-generated in a > VM failure resulting in data duplication. > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L766] > > Correct fix is to do a Reshuffle to checkpoint unique IDs once they are > generated, similar to how Java BQ sink operates. > [https://github.com/apache/beam/blob/dcf6ad301069e4d2cfaec5db6b178acb7bb67f49/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java#L225] > > Pablo, can you do an initial assessment here ? > I think this is a relatively small fix but I might be wrong. -- This message was sent by Atlassian Jira (v8.3.4#803005)