On Thu, Jul 18, 2013 at 3:13 PM, ker can <[email protected]> wrote:
>
> the hbase+hdfs throughput results were 38x better.
> Any thoughts on what might be going on ?
>
>
Looks like this might be a data locality issue. After loading the table,
when I look at the data block map of a region's store files its spread out
on disks across all nodes. For my test 'usertable' hbase table osd 0-6 is
on one node, and 7-13 is on another node. This is the map of region
"da3b3bf6c0c5a9b387d23944122f208b" store file
"0c43d345e3ea42abb5ce5a98b162218a"
hadoop@dmse-141:/mnt/mycephfs/hbase/usertable/da3b3bf6c0c5a9b387d23944122f208b/family$
cephfs 0c43d345e3ea42abb5ce5a98b162218a map
FILE OFFSET OBJECT OFFSET LENGTH OSD
0 10000001abd.00000000 0 67108864 2
67108864 10000001abd.00000001 0 67108864 4
134217728 10000001abd.00000002 0 67108864 8
201326592 10000001abd.00000003 0 67108864 6
268435456 10000001abd.00000004 0 67108864 3
335544320 10000001abd.00000005 0 67108864 6
402653184 10000001abd.00000006 0 67108864 9
469762048 10000001abd.00000007 0 67108864 9
536870912 10000001abd.00000008 0 67108864 0
603979776 10000001abd.00000009 0 67108864 2
671088640 10000001abd.0000000a 0 67108864 8
738197504 10000001abd.0000000b 0 67108864 13
805306368 10000001abd.0000000c 0 67108864 1
872415232 10000001abd.0000000d 0 67108864 1
939524096 10000001abd.0000000e 0 67108864 3
1006632960 10000001abd.0000000f 0 67108864 7
1073741824 10000001abd.00000010 0 67108864 3
1140850688 10000001abd.00000011 0 67108864 13
1207959552 10000001abd.00000012 0 67108864 13
For hbase+hdfs, all blocks within a single region were on the same region
server/data node. So in the region server stats with hdfs you see a 100%
data locality index and much better cache hit ratios.
hbase + hdfs region server stats:
blockCacheSizeMB=201.31, blockCacheFreeMB=45.57, blockCacheCount=3013,
blockCacheHitCount=9464863, blockCacheMissCount=10633061,
blockCacheEvictedCount=9305729, blockCacheHitRatio=47%,
blockCacheHitCachingRatio=50%,
hdfsBlocksLocalityIndex=100,
hbase + ceph region server stats:
blockCacheSizeMB=205.59, blockCacheFreeMB=41.29, blockCacheCount=2989,
blockCacheHitCount=1038372, blockCacheMissCount=1042117,
blockCacheEvictedCount=397801, blockCacheHitRatio=49%,
blockCacheHitCachingRatio=72%,
hdfsBlocksLocalityIndex=47
With ceph is there any way to influence the data block placement for a
single file ?
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com