[jira] [Work logged] (BEAM-8456) BigQuery to Beam SQL timestamp has the wrong default: truncation makes the most sense

ASF GitHub Bot (Jira) Fri, 25 Oct 2019 11:38:55 -0700


     [ 
https://issues.apache.org/jira/browse/BEAM-8456?focusedWorklogId=334291&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334291
 ]


ASF GitHub Bot logged work on BEAM-8456:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 25/Oct/19 18:37
            Start Date: 25/Oct/19 18:37
    Worklog Time Spent: 10m 
      Work Description: kennknowles commented on pull request #9849: 
[BEAM-8456] Add pipeline option to have Data Catalog truncate sub-millisecond 
precision
URL: https://github.com/apache/beam/pull/9849#discussion_r339184632
 
 

 ##########
 File path: 
sdks/java/extensions/sql/datacatalog/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/datacatalog/DataCatalogTableProvider.java
 ##########
 @@ -138,8 +143,41 @@ private Table loadTableFromDC(String tableName) {
     }
   }
 
-  @Override
-  public BeamSqlTable buildBeamSqlTable(Table table) {
-    return delegateProviders.get(table.getType()).buildBeamSqlTable(table);
+  private static DataCatalogBlockingStub createDataCatalogClient(
+      DataCatalogPipelineOptions options) {
+    return DataCatalogGrpc.newBlockingStub(
+            
ManagedChannelBuilder.forTarget(options.getDataCatalogEndpoint()).build())
+        .withCallCredentials(
+            
MoreCallCredentials.from(options.as(GcpOptions.class).getGcpCredential()));
+  }
+
+  private static Map<String, TableProvider> getSupportedProviders() {
+    return Stream.of(
+            new PubsubJsonTableProvider(), new BigQueryTableProvider(), new 
TextTableProvider())
+        .collect(toMap(TableProvider::getTableType, p -> p));
+  }
+
+  private Table toCalciteTable(String tableName, Entry entry) {
+    if (entry.getSchema().getColumnsCount() == 0) {
+      throw new UnsupportedOperationException(
+          "Entry doesn't have a schema. Please attach a schema to '"
+              + tableName
+              + "' in Data Catalog: "
+              + entry.toString());
+    }
+    Schema schema = SchemaUtils.fromDataCatalog(entry.getSchema());
+
+    Optional<Table.Builder> tableBuilder = tableFactory.tableBuilder(entry);
+    if (tableBuilder.isPresent()) {
+      return tableBuilder.get().schema(schema).name(tableName).build();
 
 Review comment:
   FWIW this is the only approved use of `Optional`. Since it isn't 
serializable, it isn't "data" per se, and for optional arguments they encourage 
overrides or different method names. It is specifically for methods that may or 
may not return a thing.
   
   But I take your point about chaining things that each may or may not handle 
the input. I don't love it. I wish they could just be keyed on some enum in the 
proto. Just not the current status of the Data Catalog API as I understand it.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 334291)
    Time Spent: 2h 40m  (was: 2.5h)

> BigQuery to Beam SQL timestamp has the wrong default: truncation makes the 
> most sense
> -------------------------------------------------------------------------------------
>
>                 Key: BEAM-8456
>                 URL: https://issues.apache.org/jira/browse/BEAM-8456
>             Project: Beam
>          Issue Type: Improvement
>          Components: dsl-sql
>            Reporter: Kenneth Knowles
>            Assignee: Kenneth Knowles
>            Priority: Major
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Most of the time, a user reading a timestamp from BigQuery with 
> higher-than-millisecond precision timestamps may not even realize that the 
> data source created these high precision timestamps. They are probably 
> timestamps on log entries generated by a system with higher precision.
> If they are using it with Beam SQL, which only supports millisecond 
> precision, it makes sense to "just work" by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8456) BigQuery to Beam SQL timestamp has the wrong default: truncation makes the most sense

Reply via email to