Haibo Chen created MAPREDUCE-6684:
-------------------------------------
Summary: High contention on scanning of user directory under
immediate_done in Job History Server
Key: MAPREDUCE-6684
URL: https://issues.apache.org/jira/browse/MAPREDUCE-6684
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: jobhistoryserver
Affects Versions: 2.7.0
Reporter: Haibo Chen
Assignee: Haibo Chen
Priority: Critical
HistoryFileManager.scanIntermediateDirectory() in JHS acquires a lock on each
user directory it tries to scan (move or delete files under the user directory
as necessary). This method is called in a thread in JobHistory that performs
periodical scanning of intermediate directory, and can also be called by web
server threads for each Web API call made by a JHS client. In cases where there
are many concurrent Web API calls/connections to JHS, all but one thread are
blocked on the lock on the user directory. Eventually, client connects will
time out, but the threads in JHS will not be killed and leave a lot of TCP
connections in CLOSE_WAIT state.
[systest@vb1120 ~]$ sudo netstat -nap | grep 63729 | sort -k 4
tcp 0 0 10.17.202.19:10020 0.0.0.0:*
LISTEN 63729/java
tcp 0 0 10.17.202.19:10020 10.17.198.30:33010
ESTABLISHED 63729/java
tcp 0 0 10.17.202.19:10020 10.17.200.30:33980
ESTABLISHED 63729/java
tcp 0 0 10.17.202.19:10020 10.17.202.10:59625
ESTABLISHED 63729/java
tcp 0 0 10.17.202.19:10020 10.17.202.13:35765
ESTABLISHED 63729/java
tcp 0 0 10.17.202.19:10033 0.0.0.0:*
LISTEN 63729/java
tcp 0 0 10.17.202.19:19888 0.0.0.0:*
LISTEN 63729/java
tcp 0 0 10.17.202.19:19888 10.17.198.30:35103
ESTABLISHED 63729/java
tcp 277 0 10.17.202.19:19888 10.17.198.30:43670
ESTABLISHED 63729/java
tcp 0 0 10.17.202.19:19888 10.17.198.30:45453
ESTABLISHED 63729/java
tcp 277 0 10.17.202.19:19888 10.17.198.30:49184
ESTABLISHED 63729/java
tcp 1 0 10.17.202.19:19888 10.17.202.13:49992
CLOSE_WAIT 63729/java
tcp 261 0 10.17.202.19:19888 10.17.202.13:52703
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52707
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52708
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52710
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52714
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52723
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52726
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52727
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52739
CLOSE_WAIT 63729/java
tcp 261 0 10.17.202.19:19888 10.17.202.13:52749
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52753
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52757
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52760
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52820
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52827
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52829
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52831
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52833
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52836
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52839
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52841
CLOSE_WAIT 63729/java
tcp 261 0 10.17.202.19:19888 10.17.202.13:52843
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52850
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52860
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52876
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52879
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52881
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52884
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52886
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52888
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52891
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52893
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52896
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52898
CLOSE_WAIT 63729/java
tcp 261 0 10.17.202.19:19888 10.17.202.13:52899
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52902
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52909
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52910
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52912
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52923
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52925
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52927
CLOSE_WAIT 63729/java
tcp 261 0 10.17.202.19:19888 10.17.202.13:52930
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52937
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52939
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52945
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52947
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52969
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:52972
CLOSE_WAIT 63729/java
tcp 261 0 10.17.202.19:19888 10.17.202.13:52975
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53004
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53007
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53009
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53011
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53052
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53058
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53059
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53063
CLOSE_WAIT 63729/java
tcp 261 0 10.17.202.19:19888 10.17.202.13:53071
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53084
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53093
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53095
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53097
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53101
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53104
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53106
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53108
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53110
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53112
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53114
CLOSE_WAIT 63729/java
tcp 261 0 10.17.202.19:19888 10.17.202.13:53115
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53117
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53121
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53123
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53125
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53127
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53129
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53131
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53134
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53138
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53140
CLOSE_WAIT 63729/java
tcp 261 0 10.17.202.19:19888 10.17.202.13:53153
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53155
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53157
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53159
CLOSE_WAIT 63729/java
tcp 261 0 10.17.202.19:19888 10.17.202.13:53173
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53176
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53177
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53178
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53179
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53181
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53183
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53201
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53204
CLOSE_WAIT 63729/java
tcp 261 0 10.17.202.19:19888 10.17.202.13:53218
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53267
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53270
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53275
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53278
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53280
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53283
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53293
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53296
CLOSE_WAIT 63729/java
tcp 261 0 10.17.202.19:19888 10.17.202.13:53299
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53309
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53312
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53314
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53317
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53320
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53322
CLOSE_WAIT 63729/java
tcp 256 0 10.17.202.19:19888 10.17.202.13:53338
CLOSE_WAIT 63729/java
tcp 261 0 10.17.202.19:19888 10.17.202.13:53340
CLOSE_WAIT 63729/java
tcp 255 0 10.17.202.19:19888 10.17.202.13:53364
ESTABLISHED 63729/java
tcp 255 0 10.17.202.19:19888 10.17.202.13:53366
ESTABLISHED 63729/java
tcp 260 0 10.17.202.19:19888 10.17.202.13:53367
ESTABLISHED 63729/java
tcp 255 0 10.17.202.19:19888 10.17.202.13:53380
ESTABLISHED 63729/java
tcp 255 0 10.17.202.19:19888 10.17.202.13:53382
ESTABLISHED 63729/java
tcp 255 0 10.17.202.19:19888 10.17.202.13:53386
ESTABLISHED 63729/java
tcp 255 0 10.17.202.19:19888 10.17.202.13:53390
ESTABLISHED 63729/java
tcp 255 0 10.17.202.19:19888 10.17.202.13:53392
ESTABLISHED 63729/java
tcp 1278 0 10.17.202.19:19888 10.17.202.18:45301
CLOSE_WAIT 63729/java
tcp 1278 0 10.17.202.19:19888 10.17.202.18:45303
CLOSE_WAIT 63729/java
tcp 1277 0 10.17.202.19:19888 10.17.202.18:45306
ESTABLISHED 63729/java
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)