reuvenlax commented on PR #28272:
URL: https://github.com/apache/beam/pull/28272#issuecomment-1714202499

   On Mon, Sep 11, 2023 at 9:02 AM Ahmed Abualsaud ***@***.***>
   wrote:
   
   > After a Reshuffle (we have one before single loads
   > 
<https://github.com/ahmedabu98/beam/blob/67e88d9fec9b4ac39a6d42a4bc1e64e1236bc309/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java#L806>
   > and temp loads
   > 
<https://github.com/ahmedabu98/beam/blob/67e88d9fec9b4ac39a6d42a4bc1e64e1236bc309/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java#L763>),
   > RunnerV2 doesn't propagate pane indices like legacy runner does. Instead of
   > an increasing pane index, RunnerV2 keeps it at 0.
   >
   
   This sounds like a bug in Runner V2. Pane indices are supposed to propagate
   through shuffles, and we have various things that depend on this.
   
   
   > After our reshuffles, we directly rely on pane index to construct a job ID
   > 
<https://github.com/ahmedabu98/beam/blob/67e88d9fec9b4ac39a6d42a4bc1e64e1236bc309/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java#L252-L254>.
   > In the RunnerV2 case the index stays at 0 so in some cases (e.g. single
   > loads) it produces the same job ID for each trigger. The resulting behavior
   > is that only the first load is successful and the rest are seen as
   > duplicates by BQ and ignored.
   >
   
   Should we simply file a bug against RunnerV2?
   
   
   > Yi's solution here is to save the pane index from WritePartitions (a stage
   > that is before Reshuffle) and use that information instead when creating
   > the Job ID in WriteTables.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/beam/pull/28272#issuecomment-1714176376>, or
   > unsubscribe
   > 
<https://github.com/notifications/unsubscribe-auth/AFAYJVODHIPXMXMVXOIWXZDXZ4YZZANCNFSM6AAAAAA4GWNMFI>
   > .
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to