BTW my second comment still stands I think. BigQueryIO currently uses load jobs as an implementation detail. It might end up creating one load job per table, or it might end up creating multiple load jobs per table (if the table is very large). Collapsing the multiple jobs together might be very confusing. I think making information about these jobs part of the public API is very confusing, when the actual logical model is per record.
Another thing: there will be upcoming changes to the BigQuery API, and we plan on getting rid of load jobs entirely from BigQueryIO. If we make information about load jobs part of the public API, it might be problematic when we remove the load jobs. Is this something that could be accomplished with better logging, or are there concrete use cases for wanting the output in a PCollection? [ Full content available at: https://github.com/apache/beam/pull/6055 ] This message was relayed via gitbox.apache.org for [email protected]
