[ 
https://issues.apache.org/jira/browse/FLINK-24960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17505006#comment-17505006
 ] 

Niklas Semmler commented on FLINK-24960:
----------------------------------------

An early analysis.
 # We start the JobManager via YARN - Works
 # We identify the address of the REST server - We get the right address
 # We start a RestClient to submit a job via YARN - Works
 # RestClusterClient tries to submit job

In the last step, a default address "localhost:8081" is used instead of the 
correct external address port combination. This leads to a connection refused 
error. I am not sure why this happens though. From how I understand 
{{RestClusterClient#sendRetriableRequest}}, it tries to get the address by 
using web monitor leader retrieval. It doesn't make sense to me, why this would 
return localhost:8081. I also don't see any place where this could fallback to 
the localhost:8081 pair.

> YARNSessionCapacitySchedulerITCase.testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots
>  hangs on azure
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-24960
>                 URL: https://issues.apache.org/jira/browse/FLINK-24960
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / YARN
>    Affects Versions: 1.15.0, 1.14.3
>            Reporter: Yun Gao
>            Assignee: Niklas Semmler
>            Priority: Critical
>              Labels: test-stability
>             Fix For: 1.15.0
>
>
> {code:java}
> Nov 18 22:37:08 
> ================================================================================
> Nov 18 22:37:08 Test 
> testVCoresAreSetCorrectlyAndJobManagerHostnameAreShownInWebInterfaceAndDynamicPropertiesAndYarnApplicationNameAndTaskManagerSlots(org.apache.flink.yarn.YARNSessionCapacitySchedulerITCase)
>  is running.
> Nov 18 22:37:08 
> --------------------------------------------------------------------------------
> Nov 18 22:37:25 22:37:25,470 [                main] INFO  
> org.apache.flink.yarn.YARNSessionCapacitySchedulerITCase     [] - Extracted 
> hostname:port: 5718b812c7ab:38622
> Nov 18 22:52:36 
> ==============================================================================
> Nov 18 22:52:36 Process produced no output for 900 seconds.
> Nov 18 22:52:36 
> ==============================================================================
>  {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=26722&view=logs&j=f450c1a5-64b1-5955-e215-49cb1ad5ec88&t=cc452273-9efa-565d-9db8-ef62a38a0c10&l=36395



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to