Matthias Baetens created BEAM-2122:
--------------------------------------
Summary: Writing to partitioned BigQuery tables from Dataflow is
causing errors
Key: BEAM-2122
URL: https://issues.apache.org/jira/browse/BEAM-2122
Project: Beam
Issue Type: Bug
Components: sdk-java-gcp
Environment: Running with Beam 0.7.0-SNAPSHOT version 48 for
beam-sdks-java-io-google-cloud-platform, 49 for beam-sdks-java-core and
beam-runners-google-cloud-dataflow-java in Eclipse using Dataflow service.
Reporter: Matthias Baetens
Assignee: Daniel Halperin
Using the latest Beam SNAPSHOT which has a new BigQuery connector and trying to
write to partitioned tables according to the docs (or this Stackoverflow
question
http://stackoverflow.com/questions/43505534/writing-different-values-to-different-bigquery-tables-in-apache-beam/43655461#43655461):
static class PartitionedTableGeneration
implements
SerializableFunction<ValueInSingleWindow<TableRow>, TableDestination> {
@ProcessElement
public TableDestination apply(ValueInSingleWindow<TableRow>
value) {
// String dayString =
//
DateTimeFormat.forPattern("yyyy_MM_dd").withZone(DateTimeZone.UTC)
String dayString =
DateTimeFormat.forPattern("yyyyMMdd").withZone(DateTimeZone.UTC)
.print(((IntervalWindow)
value.getWindow()).start());
TableDestination td = new TableDestination(
"projecet:dataset.table + '$'
dayString, "");
return td;
}
}
causes the following issues when running (depending on the specification of the
dayString):
1. "Invalid table ID \"partitioned_sample$20150905\". Table IDs must be
alphanumeric (plus underscores) and must be at most 1024 characters long. Also,
Table decorators cannot be used.",
2. java.lang.RuntimeException: org.apache.beam.sdk.util.UserCodeException:
java.lang.RuntimeException: Failed to create load job with id prefix
...
"errorResult" : {
"message" : "Invalid date partitioned table suffix: 2015_11_26",
"reason" : "invalid"
}
Writing to sharded tables (without the '$'-sign) is working fine.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)