[
https://issues.apache.org/jira/browse/BEAM-9494?focusedWorklogId=402440&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-402440
]
ASF GitHub Bot logged work on BEAM-9494:
----------------------------------------
Author: ASF GitHub Bot
Created on: 12/Mar/20 19:15
Start Date: 12/Mar/20 19:15
Worklog Time Spent: 10m
Work Description: lukecwik commented on pull request #11103: [BEAM-9494]
Reifying outputs from BQ file writing
URL: https://github.com/apache/beam/pull/11103#discussion_r391837238
##########
File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads.py
##########
@@ -739,9 +739,12 @@ def _write_files(self, destination_data_kv_pc,
file_prefix_pcv):
file_prefix_pcv,
*self.schema_side_inputs))
+ # We flatten both PCollection paths, and reify. We do this due to some
+ # trickiness with coder-setting on Flatten-GBK boundaries.
all_destination_file_pairs_pc = (
(destination_files_kv_pc, more_destination_files_kv_pc)
- | "DestinationFilesUnion" >> beam.Flatten())
+ | "DestinationFilesUnion" >> beam.Flatten()
+ | "ReifyInputs" >> beam.Map(lambda x: x))
Review comment:
```suggestion
| "IdentityWorkaround" >> beam.Map(lambda x: x))
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 402440)
Remaining Estimate: 0h
Time Spent: 10m
> Remove workaround for BQ transform for Dataflow
> -----------------------------------------------
>
> Key: BEAM-9494
> URL: https://issues.apache.org/jira/browse/BEAM-9494
> Project: Beam
> Issue Type: Bug
> Components: io-py-gcp
> Reporter: Luke Cwik
> Assignee: Pablo Estrada
> Priority: Minor
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Dataflow incorrectly uses the Flatten input PCollection coder when it
> performs an optimization instead of the output PCollection coder which can
> lead to issues if these coders differ.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)