[jira] [Commented] (HDFS-7180) NFSv3 gateway frequently gets stuck

Hadoop QA (JIRA) Wed, 22 Oct 2014 12:26:06 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14180382#comment-14180382
 ]


Hadoop QA commented on HDFS-7180:
---------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12676390/HDFS-7180.002.patch
  against trunk revision d67214f.

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
                        Please justify why no new tests are needed for this 
patch.
                        Also please list what manual steps were performed to 
verify this patch.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

    {color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs-nfs.

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8480//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/8480//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs-nfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8480//console

This message is automatically generated.

> NFSv3 gateway frequently gets stuck
> -----------------------------------
>
>                 Key: HDFS-7180
>                 URL: https://issues.apache.org/jira/browse/HDFS-7180
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: nfs
>    Affects Versions: 2.5.0
>         Environment: Linux, Fedora 19 x86-64
>            Reporter: Eric Zhiqiang Ma
>            Assignee: Brandon Li
>            Priority: Critical
>         Attachments: HDFS-7180.001.patch, HDFS-7180.002.patch
>
>
> We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway 
> on one node in the cluster to let users upload data with rsync.
> However, we find the NFSv3 daemon seems frequently get stuck while the HDFS 
> seems working well. (hdfds dfs -ls and etc. works just well). The last stuck 
> we found is after around 1 day running and several hundreds GBs of data 
> uploaded.
> The NFSv3 daemon is started on one node and on the same node the NFS is 
> mounted.
> From the node where the NFS is mounted:
> dmsg shows like this:
> [1859245.368108] nfs: server localhost not responding, still trying
> [1859245.368111] nfs: server localhost not responding, still trying
> [1859245.368115] nfs: server localhost not responding, still trying
> [1859245.368119] nfs: server localhost not responding, still trying
> [1859245.368123] nfs: server localhost not responding, still trying
> [1859245.368127] nfs: server localhost not responding, still trying
> [1859245.368131] nfs: server localhost not responding, still trying
> [1859245.368135] nfs: server localhost not responding, still trying
> [1859245.368138] nfs: server localhost not responding, still trying
> [1859245.368142] nfs: server localhost not responding, still trying
> [1859245.368146] nfs: server localhost not responding, still trying
> [1859245.368150] nfs: server localhost not responding, still trying
> [1859245.368153] nfs: server localhost not responding, still trying
> The mounted directory can not be `ls` and `df -hT` gets stuck too.
> The latest lines from the nfs3 log in the hadoop logs directory:
> 2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> user map size: 35
> 2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> group map size: 54
> 2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:48:56,477 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:51:46,750 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:53:23,809 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:53:24,508 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:55:57,334 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:57:07,428 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: 
> Have to change stable write to unstable write:FILE_SYNC
> 2014-10-02 05:58:32,609 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update 
> cache now
> 2014-10-02 05:58:32,610 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Not 
> doing static UID/GID mapping because '/etc/nfs.map' does not exist.
> 2014-10-02 05:58:32,620 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> user map size: 35
> 2014-10-02 05:58:32,628 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated 
> group map size: 54
> 2014-10-02 06:01:32,098 WARN org.apache.hadoop.hdfs.DFSClient: Slow 
> ReadProcessor read fields took 60062ms (threshold=30000ms); ack: seqno: -2 
> status: SUCCESS status: ERROR downstreamAckTimeNanos: 0, targets: 
> [10.0.3.172:50010, 10.0.3.176:50010]
> 2014-10-02 06:01:32,099 WARN org.apache.hadoop.hdfs.DFSClient: 
> DFSOutputStream ResponseProcessor exception  for block 
> BP-1960069741-10.0.3.170-1410430543652:blk_1074363564_623643
> java.io.IOException: Bad response ERROR for block 
> BP-1960069741-10.0.3.170-1410430543652:blk_1074363564_623643 from datanode 
> 10.0.3.176:50010
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:828)
> 2014-10-02 06:07:00,368 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery 
> for block BP-1960069741-10.0.3.170-1410430543652:blk_1074363564_623643 in 
> pipeline 10.0.3.172:50010, 10.0.3.176:50010: bad datanode 10.0.3.176:50010
> The logs seems suggest 10.0.3.176 is bad. However, from the `hdfs dfsadmin 
> -report`, all nodes in the cluster seems working.
> Any help will be appreciated. Thanks in advance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7180) NFSv3 gateway frequently gets stuck

Reply via email to