[
https://issues.apache.org/jira/browse/BEAM-5191?focusedWorklogId=270433&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-270433
]
ASF GitHub Bot logged work on BEAM-5191:
----------------------------------------
Author: ASF GitHub Bot
Created on: 01/Jul/19 19:30
Start Date: 01/Jul/19 19:30
Worklog Time Spent: 10m
Work Description: jklukas commented on pull request #8945: [BEAM-5191]
Support for BigQuery clustering
URL: https://github.com/apache/beam/pull/8945#discussion_r299185667
##########
File path:
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/DynamicDestinationsHelpers.java
##########
@@ -167,7 +175,12 @@ public TableDestination getTable(DestinationT
destination) {
@Override
Coder<DestinationT> getDestinationCoderWithDefault(CoderRegistry registry)
throws CannotProvideCoderException {
- return inner.getDestinationCoderWithDefault(registry);
+ Coder<DestinationT> destinationCoder = getDestinationCoder();
Review comment:
`DynamicDestinations#getDestinationCoderWithDefault` is commented as:
```
// Gets the destination coder. If the user does not provide one, try to
find one in the coder
// registry. If no coder can be found, throws CannotProvideCoderException.
```
This code is written with potentially multiple layers of delegation, and I
think the correct behavior here is to return the first non-delegated
implementation of `getDestinationCoder()` that appears as we move down the
delegation chain.
I would argue that the existing behavior is incorrect. Currently, if an
implementing class defines a custom return value for `getDestinationCoder`,
that value is ignored when you call `getDestinationCoderWithDefault`. My
expectation is that `getDestinationCoderWithDefault` would always return the
same value as `getDestinationCoder` except in the null case, in which
`getDestinationCoderWithDefault` would then attempt to look up a coder in the
registry.
So the change here is intended to fix broken behavior.
It's possible that a user has written a custom class that extends
DelegatingDynamicDestinations and relies on the incorrect behavior, but it
feels unlikely to me.
For the scope of the coders provided here, I don't believe this change
affects behavior (the method was already returning TableDestinationCoderV2 in
all cases).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 270433)
Time Spent: 9h (was: 8h 50m)
> Add support for writing to BigQuery clustered tables
> ----------------------------------------------------
>
> Key: BEAM-5191
> URL: https://issues.apache.org/jira/browse/BEAM-5191
> Project: Beam
> Issue Type: Improvement
> Components: io-java-gcp
> Affects Versions: 2.6.0
> Reporter: Robert Sahlin
> Assignee: Wout Scheepers
> Priority: Minor
> Labels: features, newbie
> Time Spent: 9h
> Remaining Estimate: 0h
>
> Google recently added support for clustered tables in BigQuery. It would be
> useful to set clustering columns the same way as for partitioning. It should
> support multiple fields (4) for clustering.
> For example:
> [BigQueryIO.Write|https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html]<[T|https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html]>
> .withClustering(new Clustering().setField("productId").setType("STRING"))
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)