[
https://issues.apache.org/jira/browse/BEAM-10652?focusedWorklogId=733349&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-733349
]
ASF GitHub Bot logged work on BEAM-10652:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 25/Feb/22 21:21
Start Date: 25/Feb/22 21:21
Worklog Time Spent: 10m
Work Description: brucearctor edited a comment on pull request #16578:
URL: https://github.com/apache/beam/pull/16578#issuecomment-1050561219
@chamikaramj -- how's this code look? This is an improvement, that doesn't
break any existing tests -- so, I think is OK to merge.
Tables are not created properly [ with clustering ] for
STORAGE_API_AT_LEAST_ONCE or STORAGE_WRITE_API -- so those will be new tickets
that I add. Again, since this is an improvement over existing functionality (
and does not break things ), it seems OK to merge. I might revisit to get
those working later, but for now the tests are for 'DEFAULT', and 'FILE_LOADS'.
I will write some follow up tickets for:
a) document this functionality. at least in
https://beam.apache.org/documentation/io/built-in/google-bigquery/ , but
perhaps anywhere else suggested
b) get clustering working with STORAGE_API_AT_LEAST_ONCE
c) get clustering working with STORAGE_WRITE_API
...? Anything else top of mind ( what else would we want to see, but
needn't be captured by this current improvement?
Also, would be curious to learn the implementation differences between
STORAGE_API_AT_LEAST_ONCE and STORAGE_WRITE_API ... I wonder if solved for one,
if that fixes both.
I see some related issues that I'd like to see addressed ahead of
implementing any 'fix' or additional integration tests just for clustering,
when using the storage write api --> ex:
https://issues.apache.org/jira/browse/BEAM-13747
and
https://issues.apache.org/jira/browse/BEAM-13753 /
https://issues.apache.org/jira/browse/BEAM-13990
If you agree is fine, do squash/merge, or give me the LGTM. Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 733349)
Time Spent: 46h (was: 45h 50m)
> Allow Clustering without TimePartitioning in BigQueryIO
> -------------------------------------------------------
>
> Key: BEAM-10652
> URL: https://issues.apache.org/jira/browse/BEAM-10652
> Project: Beam
> Issue Type: Improvement
> Components: io-java-gcp
> Affects Versions: 2.23.0
> Reporter: Brian Hulette
> Assignee: Bruce Arctor
> Priority: P3
> Time Spent: 46h
> Remaining Estimate: 0h
>
> [Clustering|https://cloud.google.com/bigquery/docs/clustered-tables] without
> time partitioning is allowed in BigQuery, but we specifically reject it:
> https://github.com/apache/beam/blob/5e0e798ddd827fd212ac89b8c6f6f2cf9e4b29a5/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L2511-L2515
> We should remove this check and add tests for clustering without time
> partitioning.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)