[jira] [Commented] (HDFS-7846) Create off-heap BlocksMap and BlockData structures

Colin Patrick McCabe (JIRA) Tue, 10 Mar 2015 14:00:56 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355701#comment-14355701
 ]

Colin Patrick McCabe commented on HDFS-7846:
--------------------------------------------

bq. Yi asked: For original BlockInfo, it includes triplets which let us easily 
get blocks belonging to some DatanodeStorage. In this patch, seems not. We will 
use a different approach in other JIRAs? Maybe I need to wait for HDFS-7848.

That's a good question.  I've been looking into whether we actually need to 
support iterating over all the blocks on a given datanode.  I think that this 
requirement can be removed for the full block report.  We can simply increment 
the epoch each time we get an FBR, and ignore blocks which belong to old 
epochs.  In the background, we can have a scanner thread that collects the 
garbage (shrinks block infos or removes block infos that are unreferenced).  
This frees up some memory which we would otherwise need to use on the implicit 
linked lists.  It also makes insertion and deletion easier and slightly faster.

The places where we still need some kind of full iteration are decommissioning 
and running the balancer.  However, just scanning the hash tables should be 
good enough in these use-cases.  They are rare operations... they don't happen 
that often.  Also, they require moving blocks around the cluster, which takes 
much more time than iterating over an in-memory hash table, even when there are 
a lot of non-relevant slots.  We rate-limit decommissioning anyway, and the 
limit is low enough that the extra overhead of full table iteration is not 
important.

bq. 1. In BlockMap#Builder#build, duplicate check for this.mman != null

Fixed, thanks

bq. 2. Please add InterfaceAudience and InterfaceStability to public class

ok

bq. 3. unneccessary import in new added class files

fixed

bq. Another comment is about block replicas: the replication number of file can 
be changed, then we will re-allocate off-heap memory for the blocks of file? A 
better way is to allocate separate off-heap memory for the replicas of block, 
then we just need to change that part?

This data structure is optimized for minimizing memory consumption, rather than 
minimizing copying of memory when things change.  I think that's correct 
tradeoff to make, since the replication factor of a file doesn't change that 
often.  We can also allocate space based on the desired number of replicas, 
rather than the actual number.  So we don't need to re-allocate when one 
replica is lost... we only re-allocate when the replication factor itself is 
changed.

To make this more concrete, keep in mind that with separate allocations, we'd 
have something like this:
fixed size block info  -> replica info R1 ->  replica info R2 -> replica info R3

Even with a singly (not doubly) linked list, that's still an extra 8 bytes in 
the fixed size block info for the pointer to R1, and an extra 8 bytes in each R 
for the pointer to the next.  It also will tend to fragment the heap a lot more 
because we'll be making a lot of small allocations (R1, R2, R3 will be around 
24 bytes)

In contrast, having a variable sized block info saves at least 32 bytes because 
we have no pointers.  It has better cache locality because we're not hopping 
all over memory to read the replica list.  And copying is only needed when the 
replication factor is changed by {{setReplication}} or similar.  There is less 
fragmentation because we make bigger and fewer calls to {{malloc}}.

bq. charles wrote: Yi mentioned unused imports, but there are also unnecessary 
java.lang. {String,ClassCastException}

thanks, removed

bq. BlockId.equals: constructing a ClassCastException, and especially the 
resulting call to fillInStackTrace, is an expensive way of checking the type. I 
would think instanceof is preferred.

This code path is never expected to actually be taken, though.  You shouldn't 
be comparing BlockId objects with non-BlockId objects.  If Java 1.0 had been 
designed with generics, equals would probably be {{equals(T t) where T extends 
MyType}}.  Anyway I used the Java interface because creating my own version of 
equals seemed excessive.

bq. Are you planning on doing something with Shard.name in the future?

it will be used for logging in the future

bq. The indentation of the assignment to htable is off a bit.

it seems ok?

bq. Jenkins will ask you this question, but why no unit tests?

Fair question.  I'm trying to write this as incrementally as I can and it 
seemed like unit tests could be done in a follow-on.  The really large amount 
of work will be integrating this into the main NN code

bq. Oh, I forgot to mention there are three places where git apply flags the 
patch for adding trailing whitespace.

ok

> Create off-heap BlocksMap and BlockData structures
> --------------------------------------------------
>
>                 Key: HDFS-7846
>                 URL: https://issues.apache.org/jira/browse/HDFS-7846
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: HDFS-7836
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-7846-scl.001.patch
>
>
> Create off-heap BlocksMap, BlockInfo, and DataNodeInfo structures.  The 
> BlocksMap will use the off-heap hash table.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7846) Create off-heap BlocksMap and BlockData structures

Reply via email to