Valentyn Tymofieiev created BEAM-6069:
-----------------------------------------
Summary: Bigquery Tornadoes example fails to run when we pass a
custom temp location.
Key: BEAM-6069
URL: https://issues.apache.org/jira/browse/BEAM-6069
Project: Beam
Issue Type: Bug
Components: examples-java, io-java-gcp
Reporter: Valentyn Tymofieiev
Assignee: Reuven Lax
Steps to reproduce:
{noformat}
PROJECT=$(gcloud config get-value project)
BUCKET=${USER}_gcs_bucket
BQ_DATASET=${USER}_bq_dataset
TABLE_NAME=out
bq mk --project=$PROJECT $BQ_DATASET
gsutil mb gs://$BUCKET
PATH_TO_REPO_CLONE=/path/to/beam
mvn archetype:generate -DarchetypeGroupId=org.apache.beam
-DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples
-DarchetypeVersion=2.8.0 -DgroupId=org.example -DartifactId=word-count-beam
-Dversion="0.1" -Dpackage=org.apache.beam.examples -DinteractiveMode=false
cd word-count-beam/
mkdir src/main/java/org/apache/beam/examples/cookbook
cp
$PATH_TO_REPO_CLONE/examples/java/src/main/java/org/apache/beam/examples/cookbook//BigQueryTornadoes.java
./src/main/java/org/apache/beam/examples/cookbook
mvn compile exec:java
-Dexec.mainClass=org.apache.beam.examples.cookbook.BigQueryTornadoes
-Dexec.args="--runner=DataflowRunner --project=$PROJECT
--input=clouddataflow-readonly:samples.weather_stations
--gcpTempLocation=gs://$BUCKET/tmp --output=$BQ_DATASET.$TABLE_NAME "
-Pdataflow-runner
{noformat}
This fails with:
{noformat}
java.lang.IllegalArgumentException: BigQueryIO.Read needs a GCS temp location
to store temp files.
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:122)
at
org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$TypedRead.validate(BigQueryIO.java:662)
at
org.apache.beam.sdk.Pipeline$ValidateVisitor.enterCompositeTransform(Pipeline.java:641)
at
org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:645)
at
org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:649)
at
org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:311)
at
org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:245)
at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:458)
at org.apache.beam.sdk.Pipeline.validate(Pipeline.java:577)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:312)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299)
at
org.apache.beam.examples.cookbook.BigQueryTornadoes.runBigQueryTornadoes(BigQueryTornadoes.java:166)
at
org.apache.beam.examples.cookbook.BigQueryTornadoes.main(BigQueryTornadoes.java:172)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
at java.lang.Thread.run(Thread.java:748)
{noformat}
Ironically, the example works if we remove --gcpTempLocation. From logs, we can
see that in that case we use a bucket that looks like:
gs://dataflow-staging-us-central1-927334603519.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)