[
https://issues.apache.org/jira/browse/HDFS-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991520#comment-12991520
]
stack commented on HDFS-1599:
-----------------------------
Thanks for filing this one Sanjay. Here's a bit of input if it'll help.
HDFS-918 is an attempt at moving datanode away from (2?) threads per open file
-- which is just a killer for HBase loadings (Mozilla had datanodes that had 8k
plus threads running in them because they had about 1k regions up on each of
their cluster of 20 odd nodes). HBase keeps open all files to save on trip to
Namenode inline with a random-read. The patch that has been posted has been
through many iterations, does the read path only currently (the important one
as far as hbase is concerned), seems to work in basic testing done by me and
others, and holds lots of promise (Or, lets just rewrite the datanode --
smile). The patch is pretty big and Todd is suggesting we get it in in smaller
pieces but also argument for dropping the big patch in (Related: HDFS-223,
HDFS-285, HDFS-374 which I think can now be closed).
Next up would be some kinda keepalive on pread. At the moment, we'll set up
the socket on each pread (hbase uses pread doing random lookups) EVEN though we
are seeking the same block as just read from (See HDFS-380). Chatting w/ some
of the lads, fixing this -- HDFS-941 -- is probably the least intrusive of the
issues attached but it'll get us a pretty nice improvement.
HDFS-347 is radical but in hackups, its already been demo'd that it can make
for a massive improvement in both latency AND in CPU use (Nathan in a chat on
Thursday asked why does this make for such a big win? What is the network
version doing that is causing such a slowdown. I think Dhruba makes the same
comment inline in the issue IIRC).
HDFS-1034 looks good.
HDFS-236 looks like an effort worth reviving.
Thats enough for now.
> Umbrella Jira for Improving HBASE support in HDFS
> -------------------------------------------------
>
> Key: HDFS-1599
> URL: https://issues.apache.org/jira/browse/HDFS-1599
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Sanjay Radia
>
> Umbrella Jira for improved HBase support in HDFS
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira