[jira] [Commented] (HDFS-8581) count cmd calculate wrong when huge files exist in one folder

J.Andreina (JIRA) Fri, 12 Jun 2015 04:30:05 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-8581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583271#comment-14583271
 ]


J.Andreina commented on HDFS-8581:
----------------------------------

*Issues in count is :*

While traversing throught subFolders for calculating the file count , if the 
value execeeds "dfs.content-summary.limit" , 
then traversing through other child nodes for calculating the file count is 
been skipped.

{noformat}
Scenario :
=========
dfs.content-summary.limit = 5000
/Folder1   - 10 file
/Folder2   - 6000 files
/Folder3   - 10 file

Now when i do count on "/"  it returns me file count as 6010 files , instead of 
6020 files.. 
Files under Folder3 is not counted.
{noformat}

Attached an initial patch.
Please review.

> count cmd calculate wrong when huge files exist in one folder
> -------------------------------------------------------------
>
>                 Key: HDFS-8581
>                 URL: https://issues.apache.org/jira/browse/HDFS-8581
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: HDFS
>            Reporter: tongshiquan
>            Assignee: J.Andreina
>            Priority: Minor
>
> If one directory such as "/result" exists about 200000 files, then when 
> execute "hdfs dfs -count /", the result will go wrong. For all directories 
> whose name after "/result", file num will not be included.
> My cluster see as below, "/result_1433858936" is the directory exist huge 
> files, and files in "/sparkJobHistory", "/tmp", "/user" are not included
> vm-221:/export1/BigData/current # hdfs dfs -ls /
> 15/06/11 11:00:17 INFO hdfs.PeerCache: SocketCache disabled.
> Found 9 items
> -rw-r--r--   3 hdfs   supergroup          0 2015-06-08 12:10 
> /PRE_CREATE_DIR.SUCCESS
> drwxr-x---   - flume  hadoop              0 2015-06-08 12:08 /flume
> drwx------   - hbase  hadoop              0 2015-06-10 15:25 /hbase
> drwxr-xr-x   - hdfs   supergroup          0 2015-06-10 17:19 /hyt
> drwxrwxrwx   - mapred hadoop              0 2015-06-08 12:08 /mr-history
> drwxr-xr-x   - hdfs   supergroup          0 2015-06-09 22:10 
> /result_1433858936
> drwxrwxrwx   - spark  supergroup          0 2015-06-10 19:15 /sparkJobHistory
> drwxrwxrwx   - hdfs   hadoop              0 2015-06-08 12:14 /tmp
> drwxrwxrwx   - hdfs   hadoop              0 2015-06-09 21:57 /user
> vm-221:/export1/BigData/current # 
> vm-221:/export1/BigData/current # hdfs dfs -count /
> 15/06/11 11:00:24 INFO hdfs.PeerCache: SocketCache disabled.
>         1043       171536         1756375688 /
> vm-221:/export1/BigData/current # 
> vm-221:/export1/BigData/current # hdfs dfs -count /PRE_CREATE_DIR.SUCCESS
> 15/06/11 11:00:30 INFO hdfs.PeerCache: SocketCache disabled.
>            0            1                  0 /PRE_CREATE_DIR.SUCCESS
> vm-221:/export1/BigData/current # 
> vm-221:/export1/BigData/current # hdfs dfs -count /flume
> 15/06/11 11:00:41 INFO hdfs.PeerCache: SocketCache disabled.
>            1            0                  0 /flume
> vm-221:/export1/BigData/current # 
> vm-221:/export1/BigData/current # hdfs dfs -count /hbase
> 15/06/11 11:00:49 INFO hdfs.PeerCache: SocketCache disabled.
>           36           18              14807 /hbase
> vm-221:/export1/BigData/current # 
> vm-221:/export1/BigData/current # hdfs dfs -count /hyt
> 15/06/11 11:01:09 INFO hdfs.PeerCache: SocketCache disabled.
>            1            0                  0 /hyt
> vm-221:/export1/BigData/current # 
> vm-221:/export1/BigData/current # hdfs dfs -count /mr-history
> 15/06/11 11:01:18 INFO hdfs.PeerCache: SocketCache disabled.
>            3            0                  0 /mr-history
> vm-221:/export1/BigData/current # 
> vm-221:/export1/BigData/current # hdfs dfs -count /result_1433858936
> 15/06/11 11:01:29 INFO hdfs.PeerCache: SocketCache disabled.
>         1001       171517         1756360881 /result_1433858936
> vm-221:/export1/BigData/current # 
> vm-221:/export1/BigData/current # hdfs dfs -count /sparkJobHistory
> 15/06/11 11:01:41 INFO hdfs.PeerCache: SocketCache disabled.
>            1            3              21785 /sparkJobHistory
> vm-221:/export1/BigData/current # 
> vm-221:/export1/BigData/current # hdfs dfs -count /tmp
> 15/06/11 11:01:48 INFO hdfs.PeerCache: SocketCache disabled.
>           17            6              35958 /tmp
> vm-221:/export1/BigData/current # 
> vm-221:/export1/BigData/current # hdfs dfs -count /user
> 15/06/11 11:01:55 INFO hdfs.PeerCache: SocketCache disabled.
>           12            1              19077 /user



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8581) count cmd calculate wrong when huge files exist in one folder

Reply via email to