[
https://issues.apache.org/jira/browse/BEAM-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16650459#comment-16650459
]
Scott Wegner commented on BEAM-5727:
------------------------------------
I don't see any errors in the Dataflow job logs or worker logs. Nothing looks
fishy on the job at all. !L5ebngScUXW.png!
I suspect the error / flakiness is in the Dataflow runner harness which is
polling the job state. I see in
[getStateWithRetries()|https://github.com/apache/beam/blob/42a03a6bd2e6cfdffab02752a31f3139a08a8d94/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java#L480]
that if we encounter an {{IOException}} the final state will be set to
{{UNKNOWN}}, which seems to be the case here. However, the
[getJobWithRetries()|https://github.com/apache/beam/blob/42a03a6bd2e6cfdffab02752a31f3139a08a8d94/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java#L510]
method logs a warning before re-throwing the {{IOException}}, but I don't see
that in the Gradle console logs. So, perhaps we're correctly getting a job
response which doesn't include a Status.
I recommend adding a log message if the returned message doesn't uphold the
assumptions that the consuming messages expect. In this case, we should log /
throw if we don't receive a valid status.
> [beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle] [testKvSwap] "No
> terminal state was returned"
> ----------------------------------------------------------------------------------------------------
>
> Key: BEAM-5727
> URL: https://issues.apache.org/jira/browse/BEAM-5727
> Project: Beam
> Issue Type: Bug
> Components: runner-dataflow, test-failures
> Reporter: Scott Wegner
> Assignee: Scott Wegner
> Priority: Major
> Labels: currently-failing
> Attachments: L5ebngScUXW.png
>
>
> _Use this form to file an issue for test failure:_
> * [Jenkins
> Job|https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/1251/]
> * [Gradle Build
> Scan|https://scans.gradle.com/s/wzrkrntbnvshy/tests/ubfk4psvvdijy-nlpo2dsb25h2w]
> * [Test source
> code|https://github.com/apache/beam/blob/279a05604b83a54e8e5a79e13d8761f94841f326/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineJob.java#L352]
> * [Dataflow
> job|https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-10-11_15_19_52-2781743142799123679?project=apache-beam-testing]
> Initial investigation:
> The Dataflow job succeeded, but it seems that the service response was
> missing completion state / metrics:
> Oct 11, 2018 10:25:54 PM org.apache.beam.runners.dataflow.DataflowPipelineJob
> waitUntilFinish
> WARNING: No terminal state was returned. State value UNKNOWN
> Oct 11, 2018 10:25:54 PM org.apache.beam.runners.dataflow.TestDataflowRunner
> checkForPAssertSuccess
> WARNING: Metrics not present for Dataflow job
> 2018-10-11_15_19_52-2781743142799123679.
> Oct 11, 2018 10:25:54 PM org.apache.beam.runners.dataflow.TestDataflowRunner
> run
> WARNING: Dataflow job 2018-10-11_15_19_52-2781743142799123679 did not output
> a success or failure metric.
> ----
> _After you've filled out the above details, please [assign the issue to an
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
> Assignee should [treat test failures as
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
> helping to fix the issue or find a more appropriate owner. See [Apache Beam
> Post-Commit
> Policies|https://beam.apache.org/contribute/postcommits-policies]._
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)