[ 
https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14323907#comment-14323907
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6133:
-------------------------------------------

> ... Assume on average a file has 1k blocks, ...

In many clusters, I heard that on average a file only has 1.x blocks.

> ... the diff is 40k vs 10M. ...

This is on top of the 1000 favoredNodes and 10 replicas assumption.  So, the 
case is quite extreme.

Suppose we hit this extreme case, the io time used in writing the 1000 blocks 
file is significantly larger than the time used for 10M string comparisons.  
Just have tested it in my laptop.  It takes less than 100 ms for 10M string 
comparisons.  See my test program below.

{code}
  public static void main(String[] args) {
    final Random ran = new Random();
    final String prefix = ran.nextInt(256) + "." + ran.nextInt(256)
        + "." + ran.nextInt(256) + "." + ran.nextInt(256) + ":";

    String[] a = new String[1000];
    String[] b = new String[1000];
    for(int i = 0; i < a.length; i++) {
      a[i] = prefix + ran.nextInt(1000);
      b[i] = prefix + ran.nextInt(1000);
    }
    int same = 0;
    int different = 0;
    final long starttime = System.currentTimeMillis();
    for(int k = 0; k < 10; k++) {
      for(int i = 0; i < a.length; i++) {
        for(int j = 0; j < b.length; j++) {
          if (a[i].equals(b[j])) {
            same++;
          } else {
            different++;
          }
        }
      }
    }
    final long duration = System.currentTimeMillis() - starttime;
    System.out.println("duration=" + duration + " ms, same=" + same + ", 
different=" + different);
    //Sample output:
    //duration=73 ms, same=9830, different=9990170
  }
{code}

> Make Balancer support exclude specified path
> --------------------------------------------
>
>                 Key: HDFS-6133
>                 URL: https://issues.apache.org/jira/browse/HDFS-6133
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: balancer & mover, datanode
>            Reporter: zhaoyunjiong
>            Assignee: zhaoyunjiong
>             Fix For: 2.7.0
>
>         Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, 
> HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, 
> HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, 
> HDFS-6133-9.patch, HDFS-6133.patch
>
>
> Currently, run Balancer will destroying Regionserver's data locality.
> If getBlocks could exclude blocks belongs to files which have specific path 
> prefix, like "/hbase", then we can run Balancer without destroying 
> Regionserver's data locality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to