[
https://issues.apache.org/jira/browse/BEAM-11408?focusedWorklogId=580292&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-580292
]
ASF GitHub Bot logged work on BEAM-11408:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 09/Apr/21 20:12
Start Date: 09/Apr/21 20:12
Worklog Time Spent: 10m
Work Description: udim commented on a change in pull request #14499:
URL: https://github.com/apache/beam/pull/14499#discussion_r610879276
##########
File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads.py
##########
@@ -941,7 +943,8 @@ def _write_files_with_auto_sharding(
destination_files_kv_pc = (
destination_data_kv_pc
| 'ToHashableTableRef' >> beam.Map(
- lambda kv: (bigquery_tools.get_hashable_destination(kv[0]), kv[1]))
+ lambda kv: (bigquery_tools.get_hashable_destination(kv[0]),
kv[1])).
Review comment:
Same as above, but with the additional step of making this lambda a
function (you can't add type annotations to lambdas).
##########
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##########
@@ -1431,7 +1433,8 @@ def _restore_table_ref(sharded_table_ref_elems_kv):
bigquery_tools.AppendDestinationsFn(self.table_reference),
*self.table_side_inputs)
| 'AddInsertIds' >> beam.ParDo(_StreamToBigQuery.InsertIdPrefixFn())
- | 'ToHashableTableRef' >> beam.Map(_to_hashable_table_ref))
+ | 'ToHashableTableRef' >> beam.Map(_to_hashable_table_ref)
+ ).with_output_types(Tuple[str, Any])
Review comment:
The explicit `Any` discards type information. Could you try annotating
the types in `_to_hashable_table_ref` instead?
```py
V = TypeVar('V')
def _to_hashable_table_ref(table_ref_elem_kv: Tuple[Union[str,
TABLE_REF_TYPE], V]) -> Tuple[str, V]:
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 580292)
Time Spent: 15h 50m (was: 15h 40m)
> GCP BigQuery sink (streaming inserts) uses runner determined sharding
> ---------------------------------------------------------------------
>
> Key: BEAM-11408
> URL: https://issues.apache.org/jira/browse/BEAM-11408
> Project: Beam
> Issue Type: Improvement
> Components: io-java-gcp
> Reporter: Siyuan Chen
> Assignee: Siyuan Chen
> Priority: P1
> Fix For: 2.28.0
>
> Time Spent: 15h 50m
> Remaining Estimate: 0h
>
> Integrate BigQuery sink with shardable `GroupIntoBatches` (BEAM-10475) to
> allow runner determined dynamic sharding.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)