[jira] [Updated] (HDFS-11638) Support marking a datanode dead by DFSAdmin

Tao Jie (JIRA) Mon, 17 Apr 2017 08:03:18 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tao Jie updated HDFS-11638:
---------------------------
    Attachment: HDFS-11638-001.patch

> Support marking a datanode dead by DFSAdmin
> -------------------------------------------
>
>                 Key: HDFS-11638
>                 URL: https://issues.apache.org/jira/browse/HDFS-11638
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Tao Jie
>         Attachments: HDFS-11638-001.patch
>
>
> We have met such a circumstance that:
> Kernal error occured on one slave node, and error message like
> {code}
> Apr 1 08:48:05 xxhdn033 kernel: BUG: soft lockup - CPU#0 stuck for 67s! 
> [java:19096]
> Apr 1 08:48:05 xxhdn033 kernel: Modules linked in: bridge stp llc fuse 
> autofs4 bonding ipv6 uinput iTCO_wdt iTCO_vendor_support microcode 
> power_meter acpi_ipmi ipmi_si ipmi_msghandler sb_edac edac_core joydev 
> i2c_i801 i2c_core lpc_ich mfd_core sg ses enclosure ixgbe dca ptp pps_core 
> mdio ext4 jbd2 mbcache sd_mod crc_t10dif ahci megaraid_sas dm_mirror 
> dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
> {code}
> The datanode process was still alive and continued to send heartbeat to the 
> namenode, but it could not response any command to this node and reading or 
> writing blocks on this datanode would fail. As a result, request to the HDFS 
> would be slower since too many read/write timeout.
> We try to walk around this case by adding a dfsadmin command that mark such a 
> abnormal datanode as dead by force until it get restarted. When this case 
> happens again, it would avoid the client to access the error datanode.
> Any thought?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDFS-11638) Support marking a datanode dead by DFSAdmin

Reply via email to