[
https://issues.apache.org/jira/browse/FLINK-24960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17505006#comment-17505006
]
Niklas Semmler commented on FLINK-24960:
----------------------------------------
An early analysis.
# We start the JobManager via YARN - Works
# We identify the address of the REST server - We get the right address
# We start a RestClient to submit a job via YARN - Works
# RestClusterClient tries to submit job
In the last step, a default address "localhost:8081" is used instead of the
correct external address port combination. This leads to a connection refused
error. I am not sure why this happens though. From how I understand
{{RestClusterClient#sendRetriableRequest}}, it tries to get the address by
using web monitor leader retrieval. It doesn't make sense to me, why this would
return localhost:8081. I also don't see any place where this could fallback to
the localhost:8081 pair.
> YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots
> hangs on azure
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-24960
> URL: https://issues.apache.org/jira/browse/FLINK-24960
> Project: Flink
> Issue Type: Bug
> Components: Deployment / YARN
> Affects Versions: 1.15.0, 1.14.3
> Reporter: Yun Gao
> Assignee: Niklas Semmler
> Priority: Critical
> Labels: test-stability
> Fix For: 1.15.0
>
>
> {code:java}
> Nov 18 22:37:08
> ================================================================================
> Nov 18 22:37:08 Test
> testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots(org.apache.flink.yarn.YARNSessionCapacitySchedulerITCase)
> is running.
> Nov 18 22:37:08
> --------------------------------------------------------------------------------
> Nov 18 22:37:25 22:37:25,470 [ main] INFO
> org.apache.flink.yarn.YARNSessionCapacitySchedulerITCase [] - Extracted
> hostname:port: 5718b812c7ab:38622
> Nov 18 22:52:36
> ==============================================================================
> Nov 18 22:52:36 Process produced no output for 900 seconds.
> Nov 18 22:52:36
> ==============================================================================
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=26722&view=logs&j=f450c1a5-64b1-5955-e215-49cb1ad5ec88&t=cc452273-9efa-565d-9db8-ef62a38a0c10&l=36395
--
This message was sent by Atlassian Jira
(v8.20.1#820001)