[jira] [Work logged] (BEAM-10652) Allow Clustering without TimePartitioning in BigQueryIO

ASF GitHub Bot (Jira) Fri, 11 Feb 2022 14:13:04 -0800


     [ 
https://issues.apache.org/jira/browse/BEAM-10652?focusedWorklogId=725469&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-725469
 ]


ASF GitHub Bot logged work on BEAM-10652:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 11/Feb/22 22:12
            Start Date: 11/Feb/22 22:12
    Worklog Time Spent: 10m 
      Work Description: brucearctor commented on pull request #16578:
URL: https://github.com/apache/beam/pull/16578#issuecomment-1036676433


   @chamikaramj -- on methods:  I had imagined that this working for ANY 
method/way would suffice for the exiting ticket ( since an improvement over 
what was there, and then writing up a blog and notes accordingly for what is 
known to work ).  
   
   Do we have ITs that spin up a pubsub topic ( or a Kafka Cluster and topics 
), publish messages to it, runs the pipeline to consume, and then verifies that 
the sink [ ex: table ] winds up as expected ( perhaps then tears such things 
down ).  I have this sort of automation in the past, for testing actual data 
pipelines, but that seems out of scope of this particular issue and is easily a 
different ticket/issue.  
   
   Put another way and to make concrete:  how are we running ongoing tests for 
unbounded data?  Me manually testing/checking is one thing, having actual tests 
that regularly run is another ( and really the only real way to verify that 
would be meaningful ).  
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 725469)
    Time Spent: 24h 20m  (was: 24h 10m)

> Allow Clustering without TimePartitioning in BigQueryIO
> -------------------------------------------------------
>
>                 Key: BEAM-10652
>                 URL: https://issues.apache.org/jira/browse/BEAM-10652
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-gcp
>    Affects Versions: 2.23.0
>            Reporter: Brian Hulette
>            Assignee: Bruce Arctor
>            Priority: P3
>          Time Spent: 24h 20m
>  Remaining Estimate: 0h
>
> [Clustering|https://cloud.google.com/bigquery/docs/clustered-tables] without 
> time partitioning is allowed in BigQuery, but we specifically reject it:
> https://github.com/apache/beam/blob/5e0e798ddd827fd212ac89b8c6f6f2cf9e4b29a5/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L2511-L2515
> We should remove this check and add tests for clustering without time 
> partitioning.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (BEAM-10652) Allow Clustering without TimePartitioning in BigQueryIO

Reply via email to