[
https://issues.apache.org/jira/browse/BEAM-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Amarendra Kumar updated BEAM-5434:
----------------------------------
Description:
I am trying to build a google Dataflow template to be run from a cloud function.
The issue is with BigQueryIO trying execute a SQL.
The opening step for my Dataflow Template is
{code:java}
BigQueryIO.readTableRows().withQueryLocation("US").withoutValidation().fromQuery(options.getSql()).usingStandardSql()
{code}
When the template is triggered for the first time its running fine.
But when its triggered for the second time, it fails with the following error.
{code}
// Some comments here
java.io.FileNotFoundException: No files matched spec:
gs://test-notification/temp/Notification/BigQueryExtractTemp/34d42a122600416c9ea748a6e325f87a/000000000000.avro
at
org.apache.beam.sdk.io.FileSystems.maybeAdjustEmptyMatchResult(FileSystems.java:172)
at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:158)
at
org.apache.beam.sdk.io.FileBasedSource.createReader(FileBasedSource.java:329)
at
com.google.cloud.dataflow.worker.WorkerCustomSources$1.iterator(WorkerCustomSources.java:360)
at
com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:177)
at
com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:158)
at
com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:75)
at
com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:391)
at
com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:360)
at
com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:288)
at
com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:134)
at
com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:114)
at
com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:101)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
In the second run, why is the process expecting a file in the GCS location?
This file does get created while the job is running at the first run, but it
also gets deleted after the job is complete.
How are the two jobs related?
Could you please let me know if I am missing something or this is a bug?
was:
I am trying to build a google Dataflow template to be run from a cloud function.
The issue is with BigQueryIO trying execute a SQL.
The opening step for my Dataflow Template is
{code:java}
BigQueryIO.readTableRows().withQueryLocation("US").withoutValidation().fromQuery(options.getSql()).usingStandardSql()
{code}
When the template is triggered for the first time its running fine.
But when its triggered for the second time, it fails with the following error.
{code}
// Some comments here
java.io.FileNotFoundException: No files matched spec:
gs://temp-test-notification/temp/MoEngageNotification/BigQueryExtractTemp/34d42a122600416c9ea748a6e325f87a/000000000000.avro
at
org.apache.beam.sdk.io.FileSystems.maybeAdjustEmptyMatchResult(FileSystems.java:172)
at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:158)
at
org.apache.beam.sdk.io.FileBasedSource.createReader(FileBasedSource.java:329)
at
com.google.cloud.dataflow.worker.WorkerCustomSources$1.iterator(WorkerCustomSources.java:360)
at
com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:177)
at
com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:158)
at
com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:75)
at
com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:391)
at
com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:360)
at
com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:288)
at
com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:134)
at
com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:114)
at
com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:101)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
Could you please let me know if I am missing something or this is a bug?
> Issue with BigQueryIO in Template
> ---------------------------------
>
> Key: BEAM-5434
> URL: https://issues.apache.org/jira/browse/BEAM-5434
> Project: Beam
> Issue Type: Bug
> Components: sdk-java-core
> Affects Versions: 2.5.0
> Reporter: Amarendra Kumar
> Assignee: Kenneth Knowles
> Priority: Major
>
> I am trying to build a google Dataflow template to be run from a cloud
> function.
> The issue is with BigQueryIO trying execute a SQL.
> The opening step for my Dataflow Template is
> {code:java}
> BigQueryIO.readTableRows().withQueryLocation("US").withoutValidation().fromQuery(options.getSql()).usingStandardSql()
> {code}
> When the template is triggered for the first time its running fine.
> But when its triggered for the second time, it fails with the following error.
> {code}
> // Some comments here
> java.io.FileNotFoundException: No files matched spec:
> gs://test-notification/temp/Notification/BigQueryExtractTemp/34d42a122600416c9ea748a6e325f87a/000000000000.avro
> at
> org.apache.beam.sdk.io.FileSystems.maybeAdjustEmptyMatchResult(FileSystems.java:172)
> at org.apache.beam.sdk.io.FileSystems.match(FileSystems.java:158)
> at
> org.apache.beam.sdk.io.FileBasedSource.createReader(FileBasedSource.java:329)
> at
> com.google.cloud.dataflow.worker.WorkerCustomSources$1.iterator(WorkerCustomSources.java:360)
> at
> com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:177)
> at
> com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:158)
> at
> com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:75)
> at
> com.google.cloud.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:391)
> at
> com.google.cloud.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:360)
> at
> com.google.cloud.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:288)
> at
> com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:134)
> at
> com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:114)
> at
> com.google.cloud.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:101)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> In the second run, why is the process expecting a file in the GCS location?
> This file does get created while the job is running at the first run, but it
> also gets deleted after the job is complete.
> How are the two jobs related?
> Could you please let me know if I am missing something or this is a bug?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)