[
https://issues.apache.org/jira/browse/BEAM-4824?focusedWorklogId=129406&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-129406
]
ASF GitHub Bot logged work on BEAM-4824:
----------------------------------------
Author: ASF GitHub Bot
Created on: 31/Jul/18 17:56
Start Date: 31/Jul/18 17:56
Worklog Time Spent: 10m
Work Description: reuvenlax commented on issue #6055: [BEAM-4824] Batch
BigQueryIO returns job results
URL: https://github.com/apache/beam/pull/6055#issuecomment-409311627
Thanks! Sorry for the delay, I didn't see this review earlier.
Some initial thoughts:
1. Changing internal types of PCollections (e.g. PCollection<String> ->
PCollection<BigQueryWriteResult>) in common transforms is something we try to
avoid doing, as many users rely on being able to in-place updates of their
pipelines which is impossible when types change. Not a blocker, we just might
need to make the new behavior opt-in instead of the default.
2. The set of load jobs generated is kind of an internal detail in
BigQueryIO. It might split an insert into multiple load jobs (and then run a
copy job to merge them), or in the case of streaming it might keep generating
load jobs. In addition some of these load jobs might simply be retry jobs for
previously-failed load jobs. I'm not sure that outputting per load-job
information is going to give us what we want, when the logical model is per
record.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 129406)
Time Spent: 20m (was: 10m)
> Get BigQueryIO batch loads to return something actionable
> ---------------------------------------------------------
>
> Key: BEAM-4824
> URL: https://issues.apache.org/jira/browse/BEAM-4824
> Project: Beam
> Issue Type: Improvement
> Components: io-java-gcp
> Reporter: Carlos Alonso
> Assignee: Carlos Alonso
> Priority: Minor
> Time Spent: 20m
> Remaining Estimate: 0h
>
> ATM BigQueryIO batchloads returns an empty collection that has no information
> related to how the load job finished. It is even returned before the job
> finishes.
>
> Change it so that:
> # The returning PCollection only appers when the job has actually finished
> # The returning PCollection contains information about the job result
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)