[GitHub] [flink] XComp commented on pull request #16989: [FLINK-23611][yarn-tests] Refactors yarn tests and enable hadoop logs

GitBox Tue, 31 Aug 2021 08:30:47 -0700


XComp commented on pull request #16989:
URL: https://github.com/apache/flink/pull/16989#issuecomment-909320524

I rebased the branch and added another commit that should fix the
`YARNSessionCapacitySchedulerITCase.testDetachedPerJobYarnCluster` flakiness
we're experiencing locally. See
[2f95f6d](https://github.com/apache/flink/pull/16989/commits/2f95f6deb0b880f2bc6da9463d3495c38ee433f0)'s
commit message for further details on the change and the issue:
```
We observed YARNSessionCapacitySchedulerITCase.testDetachedPerJobYarnCluster
being flaky on our local machines. The AssertionError was caused by a
certain log message ("Starting TaskManagers") not being available in the
job manager logs. The reason for that was that the JobManager startup
script seems to be triggered more than once in some cases. If that
happens, two (or more) jobmanager.log files are created with the older
having ".N" added as a suffix to the name. Due to the previously used
contains method, we ended up picking the older JobManager log file.
These logs wouldn't contain the TaskManager startup log message which
is required by the assertion.

I wasn't able to figure out why we sometimes experience multiple
JobManager startups. I checked the Hadoop code for the
DefaultContainerExecutor and the DEBUG logs for YARN. I couldn't find
any indication for a restart.

But Flink renames older log files and keeps the most-recent one as
jobmanager.log. That's the one we're interested, anyway. Hence,
selecting "jobmanager.log" through equals solves the unstable test.
```

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] XComp commented on pull request #16989: [FLINK-23611][yarn-tests] Refactors yarn tests and enable hadoop logs

Reply via email to