[ 
https://issues.apache.org/jira/browse/BEAM-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16650459#comment-16650459
 ] 

Scott Wegner commented on BEAM-5727:
------------------------------------

I don't see any errors in the Dataflow job logs or worker logs. Nothing looks 
fishy on the job at all. !L5ebngScUXW.png! 

I suspect the error / flakiness is in the Dataflow runner harness which is 
polling the job state. I see in 
[getStateWithRetries()|https://github.com/apache/beam/blob/42a03a6bd2e6cfdffab02752a31f3139a08a8d94/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java#L480]
 that if we encounter an {{IOException}} the final state will be set to 
{{UNKNOWN}}, which seems to be the case here. However, the 
[getJobWithRetries()|https://github.com/apache/beam/blob/42a03a6bd2e6cfdffab02752a31f3139a08a8d94/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java#L510]
 method logs a warning before re-throwing the {{IOException}}, but I don't see 
that in the Gradle console logs. So, perhaps we're correctly getting a job 
response which doesn't include a Status.

I recommend adding a log message if the returned message doesn't uphold the 
assumptions that the consuming messages expect. In this case, we should log / 
throw if we don't receive a valid status.

> [beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle] [testKvSwap] "No 
> terminal state was returned"
> ----------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-5727
>                 URL: https://issues.apache.org/jira/browse/BEAM-5727
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow, test-failures
>            Reporter: Scott Wegner
>            Assignee: Scott Wegner
>            Priority: Major
>              Labels: currently-failing
>         Attachments: L5ebngScUXW.png
>
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/1251/]
>  * [Gradle Build 
> Scan|https://scans.gradle.com/s/wzrkrntbnvshy/tests/ubfk4psvvdijy-nlpo2dsb25h2w]
>  * [Test source 
> code|https://github.com/apache/beam/blob/279a05604b83a54e8e5a79e13d8761f94841f326/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java#L352]
> * [Dataflow 
> job|https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-10-11_15_19_52-2781743142799123679?project=apache-beam-testing]
> Initial investigation:
> The Dataflow job succeeded, but it seems that the service response was 
> missing completion state / metrics:
> Oct 11, 2018 10:25:54 PM org.apache.beam.runners.dataflow.DataflowPipelineJob 
> waitUntilFinish
> WARNING: No terminal state was returned. State value UNKNOWN
> Oct 11, 2018 10:25:54 PM org.apache.beam.runners.dataflow.TestDataflowRunner 
> checkForPAssertSuccess
> WARNING: Metrics not present for Dataflow job 
> 2018-10-11_15_19_52-2781743142799123679.
> Oct 11, 2018 10:25:54 PM org.apache.beam.runners.dataflow.TestDataflowRunner 
> run
> WARNING: Dataflow job 2018-10-11_15_19_52-2781743142799123679 did not output 
> a success or failure metric.
> ----
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to