[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7329?focusedWorklogId=567589&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-567589
 ]

ASF GitHub Bot logged work on MAPREDUCE-7329:
---------------------------------------------

                Author: ASF GitHub Bot
            Created on: 17/Mar/21 11:26
            Start Date: 17/Mar/21 11:26
    Worklog Time Spent: 10m 
      Work Description: steveloughran commented on pull request #2775:
URL: https://github.com/apache/hadoop/pull/2775#issuecomment-801006630


   Never seen this code before so I'm not really in a position to review. Just 
trying to revise my sockets API knowledge, which dates from when I was writing 
Windows 3.1 code and hasn't been refreshed much, not since HTTP came along


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 567589)
    Time Spent: 1h 20m  (was: 1h 10m)

> HadoopPipes task may fail when linux kernel version upgrade from 3.x to 4.x
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-7329
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7329
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: chaoli
>            Priority: Major
>              Labels: patch, pull-request-available
>             Fix For: 2.6.0, 3.0.0
>
>         Attachments: 
> 0001-MAPREDUCE-7329-HadoopPipes-task-may-fail-when-linux-.patch, 
> image-2021-03-15-14-29-49-475.png, image-2021-03-15-14-37-32-184.png
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {color:#FF0000}*Hadoop Pipes Ping implement has a bug*{color}. Recently, we 
> upgrade linux kernel version from 3.x to 4.x. And we find hadoop pipe task 
> exit with connect timeout which is implemented by PingThread in 
> HadoopPipes.cc.
> !image-2021-03-15-14-37-32-184.png!
> After a deep research, we finally find that current ping server won't accept 
> ping client created socket, which may cause critical problem: 
>  *  it will cause tcp accept queue full(default 50)
>  *  when client close socket, server socket won't call close method, which 
> will leave too many CLOSE_WAIT socket fd existed(default 2h), and accept 
> queue never cleared.
>  * Even worse, in 4.x linux kernel version, it will cause tcp drop packet 
> directly which makes ping client connect time out. While In 3.x linux kernel 
> version, when accept queue full, client can also make half connection till 
> sync queue full (default 2048), so from client side, ping will aslo work till 
> sync queue full. And after 3 hours, task will also exit with connect timeout 
> exception.
> To fix this bug, we introduced a PingSocketCleaner thread, which will 
> continuously accept ping socket connect from ping client. When socket close 
> from client,  cleaner thread will detecte closed inputStream reading, then 
> finally close socket from sever side.
> Refrenced by linux kernel patch: 
> [https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5ea8ea2cb7]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to