GitHub user reuvenlax opened a pull request:
https://github.com/apache/beam/pull/3662
[BEAM-2700] Support load jobs in streaming
Allow BigQuery load jobs to be selected by the user even when using
unbounded PCollections. If using unbounded PCollections, the user must specify
a frequency indicating how often these load jobs will be generated.
Note: while there are some similarities between the BigQuery transform and
what is done in FileBasedSink, there are a enough differences that it does not
appear easy or advisable to attempt to reuse the code.
Note: a design choice is to only allow the user to specify a triggering
frequency, not arbitrary windows. The reason is that this triggering frequency
is merely a tuning parameter controlling the BigQuery load jobs and is usually
set to keep the number of BQ load jobs under quota (ideally it wouldn't even be
needed, however I don't know how to make this automatic and respect user
quotas). There is no need for semantic windowing to control how often these
writes happen.
R:@jkff
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/reuvenlax/incubator-beam
bq_load_jobs_in_streaming
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/beam/pull/3662.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3662
----
commit 83fccf0cecb2b5eff1d4b814597c85256f2773f0
Author: Reuven Lax <[email protected]>
Date: 2017-07-30T18:17:39Z
Allow users to choose the BigQuery insertion method. If choosing file load
jobs on an unbounded PCollection, a triggering frequency must be specified to
control how often load jobs are generated.
commit 128984b00bb42782767ee34c74f3c6b234b83d93
Author: Reuven Lax <[email protected]>
Date: 2017-07-30T18:36:12Z
Cleanup
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---