Hey all,

Almost two weeks ago, I create a PR to support BigQuery clustering [1].
Can someone please have a look?

Thanks,
Wout

1: https://github.com/apache/beam/pull/7061


From: Lukasz Cwik <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Wednesday, 29 August 2018 at 18:32
To: dev <[email protected]>, "[email protected]" <[email protected]>
Cc: Bob De Schutter <[email protected]>
Subject: Re: BigqueryIO field clustering

[email protected]<mailto:[email protected]>

Wout, I assigned this task to you since it seems like your interested in 
contributing.
The Apache Beam contribution guide[1] is a good place to start for answering 
questions on how to contribute.

If you need help in getting stuff reviewed or having questions, feel free to 
reach out on [email protected]<mailto:[email protected]> or on Slack.

1: https://beam.apache.org/contribute/


On Wed, Aug 29, 2018 at 1:28 AM Wout Scheepers 
<[email protected]<mailto:[email protected]>> 
wrote:
Hey all,

I’m trying to use the field clustering beta feature in bigquery [1].
However, the current Beam/dataflow worker bigquery api service dependency is 
‘google-api-services-bigquery: com.google.apis: v2-rev374-1.23.0’, which does 
not include the clustering option in the TimePartitioning class.
Hereby, I can’t specify the clustering field when loading/streaming into 
bigquery. See [2] for the bigquery api error details.

Does anyone know a workaround for this?

I guess that in the worst case I’ll have to wait until Beam supports a newer 
version of the bigquery api service.
1.    After checking the Beam Jira I’ve found 
BEAM-5191<https://jira.apache.org/jira/browse/BEAM-5191>. Is there any way I 
can help to push this forward and make this feature possible in the near future?

Thanks in advance,
Wout

[1] https://cloud.google.com/bigquery/docs/clustered-tables
[2] "errorResult" : {
      "message" : "Incompatible table partitioning specification. Expects 
partitioning specification interval(type:day,field:publish_time) 
clustering(clustering_id), but input partitioning specification is 
interval(type:day,field:publish_time)",
      "reason" : "invalid"
    }

Reply via email to