[ 
https://issues.apache.org/jira/browse/FLINK-24213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17412406#comment-17412406
 ] 

Chesnay Schepler edited comment on FLINK-24213 at 9/9/21, 7:32 AM:
-------------------------------------------------------------------

[~xmarker] concurrency analysis is correct.

I see 2 options here. Either we enforce a strict order such that we always 
start close() calls from the ServerConnection, or we merge the locks.
Due to how the connections are constructed the first option cannot be 
implemented without larger refactorings, so I'm inclined to go with the latter; 
it should also make it in general easier to reason about, and the separation of 
locks doesn't provide any real benefit as far as I can tell.


was (Author: zentol):
[~xmarker] concurrency analysis is correct, but I wouldn't say that this is 
really the issue.

I see 2 options here. Either we enforce a strict order such that we always 
start close() calls from the ServerConnection, or we merge the locks.
Due to how the connections are constructed the first option cannot be 
implemented without larger refactorings, so I'm inclined to go with the latter; 
it should also make it in general easier to reason about, and the separation of 
locks doesn't provide any real benefit as far as I can tell.

> Java deadlock in QueryableState ClientTest
> ------------------------------------------
>
>                 Key: FLINK-24213
>                 URL: https://issues.apache.org/jira/browse/FLINK-24213
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Queryable State
>    Affects Versions: 1.15.0
>            Reporter: Dawid Wysakowicz
>            Assignee: Chesnay Schepler
>            Priority: Major
>              Labels: test-stability
>
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=23750&view=logs&j=d44f43ce-542c-597d-bf94-b0718c71e5e8&t=ed165f3f-d0f6-524b-5279-86f8ee7d0e2d&l=15476
> {code}
>  Found one Java-level deadlock:
> Sep 08 11:12:50 =============================
> Sep 08 11:12:50 "Flink Test Client Event Loop Thread 0":
> Sep 08 11:12:50   waiting to lock monitor 0x00007f4e380309c8 (object 
> 0x0000000086b2cd50, a java.lang.Object),
> Sep 08 11:12:50   which is held by "main"
> Sep 08 11:12:50 "main":
> Sep 08 11:12:50   waiting to lock monitor 0x00007f4ea4004068 (object 
> 0x0000000086b2cf50, a java.lang.Object),
> Sep 08 11:12:50   which is held by "Flink Test Client Event Loop Thread 0"
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to