[
https://issues.apache.org/jira/browse/BEAM-12865?focusedWorklogId=679851&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-679851
]
ASF GitHub Bot logged work on BEAM-12865:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 10/Nov/21 19:18
Start Date: 10/Nov/21 19:18
Worklog Time Spent: 10m
Work Description: quentin-sommer commented on a change in pull request
#15489:
URL: https://github.com/apache/beam/pull/15489#discussion_r746911914
##########
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##########
@@ -2158,7 +2169,7 @@ def expand(self, pcoll):
schema=self.schema,
create_disposition=self.create_disposition,
write_disposition=self.write_disposition,
- triggering_frequency=self.triggering_frequency,
+ triggering_frequency=int(self.triggering_frequency),
Review comment:
This code is the `BigQueryBatchFileLoads` class, the default value is
`None` and the doc advises to use at least 2 minutes to avoid reaching the per
project quota of bigquery load jobs. the code errors when it is used with
`None` in a streaming pipeline.
I think it should be an integer. `BigQueryBatchFileLoads` uses it like this
https://github.com/apache/beam/blob/8da177f64d314cf72e89a51e51fb0915f706a784/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L873-L874
[beam
reference](https://beam.apache.org/releases/pydoc/2.33.0/apache_beam.transforms.trigger.html?highlight=trigger#apache_beam.transforms.trigger.AfterProcessingTime)
states it's a second delay and I'm not sure what the implementation is doing
so I'd rather be on the safe side and keep integers.
I added some logic to only cast to int when the value is not `None`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 679851)
Time Spent: 7.5h (was: 7h 20m)
> Allow customising batch duration when streaming with WriteToBigQuery
> --------------------------------------------------------------------
>
> Key: BEAM-12865
> URL: https://issues.apache.org/jira/browse/BEAM-12865
> Project: Beam
> Issue Type: New Feature
> Components: io-py-gcp
> Affects Versions: Not applicable
> Reporter: Quentin Sommer
> Priority: P2
> Labels: stale-P2
> Fix For: Not applicable
>
> Time Spent: 7.5h
> Remaining Estimate: 0h
>
> Hi,
> We allow customising the {{batch_size}} when streaming to BigQuery but the
> batch duration (used by {{GroupIntoBatches}}) is set to
> {{DEFAULT_BATCH_BUFFERING_DURATION_LIMIT_SEC}} (0.2)
> I'd like to add the option to specify the {{batch_duration}} to allow better
> batching for scenarios with little data throughput.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)