[jira] [Commented] (BEAM-2761) Write to empty BigQuery partition fails with "No schema specified on job or table." despite having provided schema

Reuven Lax (JIRA) Tue, 12 Sep 2017 12:17:51 -0700

    [ 
https://issues.apache.org/jira/browse/BEAM-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16163509#comment-16163509
 ]


Reuven Lax commented on BEAM-2761:
----------------------------------

Hi, I ran the precise job listed in the bug using the latest Beam snapshot, and 
could not reproduce the failure. BigQuery successfully created an empty table 
called pets and did not fail with the error. I am going to resolve this issue 
for now. Please reopen if you can reproduce this on the Beam 2.2.0 snapshots.

> Write to empty BigQuery partition fails with "No schema specified on job or 
> table." despite having provided schema
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-2761
>                 URL: https://issues.apache.org/jira/browse/BEAM-2761
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>    Affects Versions: 2.1.0, 2.2.0
>            Reporter: Fallon
>            Assignee: Reuven Lax
>            Priority: Minor
>             Fix For: 2.2.0
>
>         Attachments: beam-2761-stacktrace.txt
>
>
> In 2.1.0-SNAPSHOT and 2.2.0-SNAPSHOT, jobs writing an empty PCollection to a 
> BigQuery partition fail with "java.lang.RuntimeException: Failed to create 
> load job with id prefix". This is associated with a message "No schema 
> specified on job or table" even though a schema is provided. See attached 
> stack trace for the more detail on the error.
> Command to run job:
> {code}
> mvn compile exec:java 
> -Dexec.mainClass=org.apache.beam.examples.EmptyPCollection \
>      -Dexec.args="--runner=DataflowRunner --project=<GCP project> \
>                   --gcpTempLocation=<tmp location>" \
>      -Pdataflow-runner
> {code}
> Code to reproduce the problem:
> {code:title=EmptyPCollection.java|borderStyle=solid}
> public class EmptyPCollection {
>   public static void main(String[] args) {
>     PipelineOptions options = PipelineOptionsFactory.fromArgs(args).create();
>     options.setTempLocation("<your tmp location>");
>     Pipeline pipeline = Pipeline.create(options);
>     String schema = "{\"fields\": [{\"name\": \"pet\", \"type\": \"string\", 
> \"mode\": \"required\"}]}";
>     String table = "mydataset.pets";
>     List<String> pets = Arrays.asList("Dog", "Cat", "Goldfish");
>     PCollection<String> inputText = 
> pipeline.apply(Create.of(pets)).setCoder(StringUtf8Coder.of());
>     PCollection<TableRow> rows = inputText.apply(ParDo.of(new DoFn<String, 
> TableRow>() {
>       @ProcessElement
>       public void processElement(ProcessContext c) {
>         String text = c.element();
>         if (text.startsWith("X")) {  // change to (D)og and works fine
>           TableRow row = new TableRow();
>           row.set("pet", text);
>           c.output(row);
>         }
>       }
>     }));
>     rows.apply(BigQueryIO.writeTableRows().to(table).withJsonSchema(schema)
>         .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
>         
> .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED));
>     pipeline.run().waitUntilFinish();
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-2761) Write to empty BigQuery partition fails with "No schema specified on job or table." despite having provided schema

Reply via email to