[jira] [Commented] (HDFS-10815) The state of the EC file is erroneously recognized when you restart the NameNode.

Chen Zhiyin (JIRA) Sun, 18 Sep 2016 23:59:33 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-10815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15502524#comment-15502524
 ]


Chen Zhiyin commented on HDFS-10815:
------------------------------------

I have tied several times, but cannot reproduce the error in our cluster. The 
following is my step to reproduce:

1. Decommission four data nodes in my cluster which has 9 data nodes and 1 name 
node in total.
2. Generate 9 files in the path /benchmarks and the size of each file is 15GB.
3. Set erasure code policy "RS-DEFAULT-3-2-64k" on the path /ECTest.
4. Copy files to the path /ECTest by the command: bin/hdfs dfs -cp 
/benchmarks/* /ECTest
5. Kill the data node process in data node 1: sudo pkill -9 -f datanode
6. Carry out hdfs fsck, however, the files in the path /ECTest is healthy.

I have no idea about the reason why I can not reproduce the error. Could you 
help me?


> The state of the EC file is erroneously recognized when you restart the 
> NameNode.
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-10815
>                 URL: https://issues.apache.org/jira/browse/HDFS-10815
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: erasure-coding
>    Affects Versions: 3.0.0-alpha1
>         Environment: 2 NameNodes, 5 DataNodes, Erasured code policy is set as 
> "RS-DEFAULT-3-2-64k"
>            Reporter: Eisuke Umeda
>
> After carrying out an examination in the following procedures, an EC files 
> came to be recognized as corrupt files.
> These files were able to get in "hdfs dfs -get".
> NameNode might be causing the false recognition.
> DataNodes: datanode[1-5]
> Rack awareness: not set
> Copy target files: /tmp/tpcds-generate/25/store_sales/*
> {code}
> $ hdfs dfs -ls /tmp/tpcds-generate/25/store_sales
> Found 25 items
> -rw-r--r--   0 root supergroup  399430918 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-00000
> -rw-r--r--   0 root supergroup  399054598 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-00001
> -rw-r--r--   0 root supergroup  399329373 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-00002
> -rw-r--r--   0 root supergroup  399528459 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-00003
> -rw-r--r--   0 root supergroup  399329624 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-00004
> -rw-r--r--   0 root supergroup  399085924 2016-08-16 15:11 
> /tmp/tpcds-generate/25/store_sales/data-m-00005
> -rw-r--r--   0 root supergroup  399337384 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00006
> -rw-r--r--   0 root supergroup  399199458 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00007
> -rw-r--r--   0 root supergroup  399679096 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00008
> -rw-r--r--   0 root supergroup  399440431 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00009
> -rw-r--r--   0 root supergroup  399403931 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00010
> -rw-r--r--   0 root supergroup  399472465 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00011
> -rw-r--r--   0 root supergroup  399451784 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00012
> -rw-r--r--   0 root supergroup  399240168 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00013
> -rw-r--r--   0 root supergroup  399370507 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00014
> -rw-r--r--   0 root supergroup  399633351 2016-08-16 15:12 
> /tmp/tpcds-generate/25/store_sales/data-m-00015
> -rw-r--r--   0 root supergroup  396532952 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00016
> -rw-r--r--   0 root supergroup  396258715 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00017
> -rw-r--r--   0 root supergroup  396382486 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00018
> -rw-r--r--   0 root supergroup  399016456 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00019
> -rw-r--r--   0 root supergroup  399465745 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00020
> -rw-r--r--   0 root supergroup  399208235 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00021
> -rw-r--r--   0 root supergroup  399198296 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00022
> -rw-r--r--   0 root supergroup  399599711 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00023
> -rw-r--r--   0 root supergroup  395150855 2016-08-16 15:13 
> /tmp/tpcds-generate/25/store_sales/data-m-00024
> {code}
> NameNodes:
>   namenode1(active)
>   namenode2(standby)
> The directory which there is "Under-erasure-coded block groups": 
> /tmp/tpcds-generate/test
> {code}
> $ sudo -u hdfs hdfs erasurecode -getPolicy /tmp/tpcds-generate/test
> ErasureCodingPolicy=[Name=RS-DEFAULT-3-2-64k, 
> Schema=[ECSchema=[Codec=rs-default, numDataUnits=3, numParityUnits=2]], 
> CellSize=65536 ]
> {code}
> The following is the steps to reproduce:
> 1) hdfs dfs -cp /tmp/tpcds-generate/25/store_sales/* /tmp/tpcds-generate/test
> 2) datanode1: (in the middle of the copy) sudo pkill -9 -f datanode
> 3) start a process of datanode1 two minutes later
> 4) carry out hdfs fsck and confirm that Under-Replicated Blocks occurred
> 5) wait until Under-Replicated Blocks becomes 0
> 6) (namenode1) /etc/init.d/hadoop-hdfs-namenode restart
> 7) (namenode2) /etc/init.d/hadoop-hdfs-namenode restart



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-10815) The state of the EC file is erroneously recognized when you restart the NameNode.

Reply via email to