[
https://issues.apache.org/jira/browse/BEAM-4824?focusedWorklogId=145452&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-145452
]
ASF GitHub Bot logged work on BEAM-4824:
----------------------------------------
Author: ASF GitHub Bot
Created on: 18/Sep/18 19:43
Start Date: 18/Sep/18 19:43
Worklog Time Spent: 10m
Work Description: reuvenlax commented on issue #6055: [BEAM-4824] Batch
BigQueryIO returns job results
URL: https://github.com/apache/beam/pull/6055#issuecomment-422524409
BTW my second comment still stands I think. BigQueryIO currently uses load
jobs as an implementation detail. It might end up creating one load job per
table, or it might end up creating multiple load jobs per table (if the table
is very large). Collapsing the multiple jobs together might be very confusing.
I think making information about these jobs part of the public API is very
confusing, when the actual logical model is per record.
Another thing: there will be upcoming changes to the BigQuery API, and we
plan on getting rid of load jobs entirely from BigQueryIO. If we make
information about load jobs part of the public API, it might be problematic
when we remove the load jobs.
Is this something that could be accomplished with better logging, or are
there concrete use cases for wanting the output in a PCollection?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 145452)
Time Spent: 1h 10m (was: 1h)
> Get BigQueryIO batch loads to return something actionable
> ---------------------------------------------------------
>
> Key: BEAM-4824
> URL: https://issues.apache.org/jira/browse/BEAM-4824
> Project: Beam
> Issue Type: Improvement
> Components: io-java-gcp
> Reporter: Carlos Alonso
> Assignee: Carlos Alonso
> Priority: Minor
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> ATM BigQueryIO batchloads returns an empty collection that has no information
> related to how the load job finished. It is even returned before the job
> finishes.
>
> Change it so that:
> # The returning PCollection only appers when the job has actually finished
> # The returning PCollection contains information about the job result
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)