[ 
https://issues.apache.org/jira/browse/BEAM-9494?focusedWorklogId=402441&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-402441
 ]

ASF GitHub Bot logged work on BEAM-9494:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 12/Mar/20 19:15
            Start Date: 12/Mar/20 19:15
    Worklog Time Spent: 10m 
      Work Description: lukecwik commented on pull request #11103: [BEAM-9494] 
Reifying outputs from BQ file writing
URL: https://github.com/apache/beam/pull/11103#discussion_r391837106
 
 

 ##########
 File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads.py
 ##########
 @@ -739,9 +739,12 @@ def _write_files(self, destination_data_kv_pc, 
file_prefix_pcv):
             file_prefix_pcv,
             *self.schema_side_inputs))
 
+    # We flatten both PCollection paths, and reify. We do this due to some
+    # trickiness with coder-setting on Flatten-GBK boundaries.
     all_destination_file_pairs_pc = (
         (destination_files_kv_pc, more_destination_files_kv_pc)
 
 Review comment:
   ```suggestion
       # TODO(BEAM-9494): Remove the identity transform. We flatten both 
PCollection paths
       # and use an identity function to work around a flatten optimization 
issue where the wrong
       # coder is being used.
       all_destination_file_pairs_pc = (
           (destination_files_kv_pc, more_destination_files_kv_pc)
   ```
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 402441)

> Remove workaround for BQ transform for Dataflow
> -----------------------------------------------
>
>                 Key: BEAM-9494
>                 URL: https://issues.apache.org/jira/browse/BEAM-9494
>             Project: Beam
>          Issue Type: Bug
>          Components: io-py-gcp
>            Reporter: Luke Cwik
>            Assignee: Pablo Estrada
>            Priority: Minor
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Dataflow incorrectly uses the Flatten input PCollection coder when it 
> performs an optimization instead of the output PCollection coder which can 
> lead to issues if these coders differ.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to