[
https://issues.apache.org/jira/browse/BEAM-8456?focusedWorklogId=332960&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332960
]
ASF GitHub Bot logged work on BEAM-8456:
----------------------------------------
Author: ASF GitHub Bot
Created on: 23/Oct/19 23:21
Start Date: 23/Oct/19 23:21
Worklog Time Spent: 10m
Work Description: apilloud commented on pull request #9849: [BEAM-8456]
Add pipeline option to have Data Catalog truncate sub-millisecond precision
URL: https://github.com/apache/beam/pull/9849#discussion_r338307365
##########
File path:
sdks/java/extensions/sql/datacatalog/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/datacatalog/BigQueryTableFactory.java
##########
@@ -20,23 +20,39 @@
import com.alibaba.fastjson.JSONObject;
import com.google.cloud.datacatalog.Entry;
import java.net.URI;
+import java.util.Optional;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.beam.sdk.extensions.sql.meta.Table;
+import org.apache.beam.sdk.extensions.sql.meta.Table.Builder;
+import
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableMap;
-/** Utils to extract BQ-specific entry information. */
-class BigQueryUtils {
+/** {@link TableFactory} that understands Data Catalog BigQuery entries. */
+class BigQueryTableFactory implements TableFactory {
+ private static final String BIGQUERY_API = "bigquery.googleapis.com";
private static final Pattern BQ_PATH_PATTERN =
Pattern.compile(
"/projects/(?<PROJECT>[^/]+)/datasets/(?<DATASET>[^/]+)/tables/(?<TABLE>[^/]+)");
- static Table.Builder tableBuilder(Entry entry) {
- return Table.builder()
- .location(getLocation(entry))
- .properties(new JSONObject())
- .type("bigquery")
- .comment("");
+ private final boolean truncateTimestamps;
+
+ public BigQueryTableFactory(boolean truncateTimestamps) {
+ this.truncateTimestamps = truncateTimestamps;
+ }
+
+ @Override
+ public Optional<Builder> tableBuilder(Entry entry) {
+ if
(URI.create(entry.getLinkedResource()).getAuthority().toLowerCase().equals(BIGQUERY_API))
{
+ return Optional.of(
+ Table.builder()
+ .location(getLocation(entry))
+ .properties(new JSONObject(ImmutableMap.of("truncateTimestamps",
truncateTimestamps)))
+ .type("bigquery")
+ .comment(""));
+ } else {
Review comment:
nit: Unneeded else. This might be clearer if the `if` bailed out.
For example:
```
if (!URI.equals()) {
return Optional.empty();
}
// Actual work
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 332960)
Time Spent: 1.5h (was: 1h 20m)
> BigQuery to Beam SQL timestamp has the wrong default: truncation makes the
> most sense
> -------------------------------------------------------------------------------------
>
> Key: BEAM-8456
> URL: https://issues.apache.org/jira/browse/BEAM-8456
> Project: Beam
> Issue Type: Improvement
> Components: dsl-sql
> Reporter: Kenneth Knowles
> Assignee: Kenneth Knowles
> Priority: Major
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> Most of the time, a user reading a timestamp from BigQuery with
> higher-than-millisecond precision timestamps may not even realize that the
> data source created these high precision timestamps. They are probably
> timestamps on log entries generated by a system with higher precision.
> If they are using it with Beam SQL, which only supports millisecond
> precision, it makes sense to "just work" by default.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)