[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260149#comment-15260149
 ] 

Robert Joseph Evans commented on MAPREDUCE-6684:
------------------------------------------------

I agree with Jason.  The code is complex and there are a lot of races in there. 
 Making the code simpler and still solving the problem would be great.  But I 
think a lot of the races are inherent in the problem, some users are trying to 
read data and get a consistent view of it while at the same time the data is 
moving from one place to another.  In many cases making something have less 
lock contention inherently makes the code more complex.  Although I am happy to 
be proven wrong so if you have some specific suggestions on what and how to 
clean it up that would be helpful in understanding if it is worth the risk.

> High contention on scanning of user directory under immediate_done in Job 
> History Server
> ----------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6684
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6684
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 2.7.0
>            Reporter: Haibo Chen
>            Assignee: Haibo Chen
>            Priority: Critical
>         Attachments: jhs-jstacks-service-monitor-running.tar.gz, 
> jhs-jstacks-service-monitor-stopped.tar.gz
>
>
> HistoryFileManager.scanIntermediateDirectory() in JHS acquires a lock on each 
> user directory it tries to scan (move or delete files under the user 
> directory as necessary). This method is called in a thread in JobHistory that 
> performs periodical scanning of intermediate directory, and can also be 
> called by web server threads for each Web API call made by a JHS client. In 
> cases where there are many concurrent Web API calls/connections to JHS, all 
> but one thread are blocked on the lock on the user directory. Eventually, 
> client connects will time out, but the threads in JHS will not be killed and 
> leave a lot of TCP connections in CLOSE_WAIT state. 
> {noformat}
> [systest@vb1120 ~]$ sudo netstat -nap | grep 63729 | sort -k 4
> tcp        0      0 10.17.202.19:10020          0.0.0.0:*                   
> LISTEN      63729/java          
> tcp        0      0 10.17.202.19:10020          10.17.198.30:33010          
> ESTABLISHED 63729/java          
> tcp        0      0 10.17.202.19:10020          10.17.200.30:33980          
> ESTABLISHED 63729/java          
> tcp        0      0 10.17.202.19:10020          10.17.202.10:59625          
> ESTABLISHED 63729/java          
> tcp        0      0 10.17.202.19:10020          10.17.202.13:35765          
> ESTABLISHED 63729/java          
> tcp        0      0 10.17.202.19:10033          0.0.0.0:*                   
> LISTEN      63729/java          
> tcp        0      0 10.17.202.19:19888          0.0.0.0:*                   
> LISTEN      63729/java          
> tcp        0      0 10.17.202.19:19888          10.17.198.30:35103          
> ESTABLISHED 63729/java          
> tcp      277      0 10.17.202.19:19888          10.17.198.30:43670          
> ESTABLISHED 63729/java          
> tcp        0      0 10.17.202.19:19888          10.17.198.30:45453          
> ESTABLISHED 63729/java          
> tcp      277      0 10.17.202.19:19888          10.17.198.30:49184          
> ESTABLISHED 63729/java          
> tcp        1      0 10.17.202.19:19888          10.17.202.13:49992          
> CLOSE_WAIT  63729/java          
> tcp      261      0 10.17.202.19:19888          10.17.202.13:52703          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52707          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52708          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52710          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52714          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52723          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52726          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52727          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52739          
> CLOSE_WAIT  63729/java          
> tcp      261      0 10.17.202.19:19888          10.17.202.13:52749          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52753          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52757          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52760          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52820          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52827          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52829          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52831          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52833          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52836          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52839          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52841          
> CLOSE_WAIT  63729/java          
> tcp      261      0 10.17.202.19:19888          10.17.202.13:52843          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52850          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52860          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52876          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52879          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52881          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52884          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52886          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52888          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52891          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52893          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52896          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52898          
> CLOSE_WAIT  63729/java          
> tcp      261      0 10.17.202.19:19888          10.17.202.13:52899          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52902          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52909          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52910          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52912          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52923          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52925          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52927          
> CLOSE_WAIT  63729/java          
> tcp      261      0 10.17.202.19:19888          10.17.202.13:52930          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52937          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52939          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52945          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52947          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52969          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:52972          
> CLOSE_WAIT  63729/java          
> tcp      261      0 10.17.202.19:19888          10.17.202.13:52975          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53004          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53007          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53009          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53011          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53052          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53058          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53059          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53063          
> CLOSE_WAIT  63729/java          
> tcp      261      0 10.17.202.19:19888          10.17.202.13:53071          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53084          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53093          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53095          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53097          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53101          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53104          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53106          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53108          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53110          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53112          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53114          
> CLOSE_WAIT  63729/java          
> tcp      261      0 10.17.202.19:19888          10.17.202.13:53115          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53117          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53121          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53123          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53125          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53127          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53129          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53131          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53134          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53138          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53140          
> CLOSE_WAIT  63729/java          
> tcp      261      0 10.17.202.19:19888          10.17.202.13:53153          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53155          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53157          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53159          
> CLOSE_WAIT  63729/java          
> tcp      261      0 10.17.202.19:19888          10.17.202.13:53173          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53176          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53177          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53178          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53179          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53181          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53183          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53201          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53204          
> CLOSE_WAIT  63729/java          
> tcp      261      0 10.17.202.19:19888          10.17.202.13:53218          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53267          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53270          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53275          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53278          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53280          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53283          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53293          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53296          
> CLOSE_WAIT  63729/java          
> tcp      261      0 10.17.202.19:19888          10.17.202.13:53299          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53309          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53312          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53314          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53317          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53320          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53322          
> CLOSE_WAIT  63729/java          
> tcp      256      0 10.17.202.19:19888          10.17.202.13:53338          
> CLOSE_WAIT  63729/java          
> tcp      261      0 10.17.202.19:19888          10.17.202.13:53340          
> CLOSE_WAIT  63729/java          
> tcp      255      0 10.17.202.19:19888          10.17.202.13:53364          
> ESTABLISHED 63729/java          
> tcp      255      0 10.17.202.19:19888          10.17.202.13:53366          
> ESTABLISHED 63729/java          
> tcp      260      0 10.17.202.19:19888          10.17.202.13:53367          
> ESTABLISHED 63729/java          
> tcp      255      0 10.17.202.19:19888          10.17.202.13:53380          
> ESTABLISHED 63729/java          
> tcp      255      0 10.17.202.19:19888          10.17.202.13:53382          
> ESTABLISHED 63729/java          
> tcp      255      0 10.17.202.19:19888          10.17.202.13:53386          
> ESTABLISHED 63729/java          
> tcp      255      0 10.17.202.19:19888          10.17.202.13:53390          
> ESTABLISHED 63729/java          
> tcp      255      0 10.17.202.19:19888          10.17.202.13:53392          
> ESTABLISHED 63729/java          
> tcp     1278      0 10.17.202.19:19888          10.17.202.18:45301          
> CLOSE_WAIT  63729/java          
> tcp     1278      0 10.17.202.19:19888          10.17.202.18:45303          
> CLOSE_WAIT  63729/java          
> tcp     1277      0 10.17.202.19:19888          10.17.202.18:45306          
> ESTABLISHED 63729/java 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to