[ https://issues.apache.org/jira/browse/HBASE-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16447610#comment-16447610 ]
Yu Li commented on HBASE-6572: ------------------------------ Hello all, since all sub tasks have completed and both 1.5 (after HBASE-19858) and 2.0 would support this, shall we add some document in our ref-guide about HSM? I'm up to write the doc if you all think it's time for us to officially announce supporting HSM, thanks. [~apurtell] [~stack] Asking simply because see [some question|https://s.apache.org/yw5G] around this recently in our user list and feel we should have some explicit doc for our users :-) > Tiered HFile storage > -------------------- > > Key: HBASE-6572 > URL: https://issues.apache.org/jira/browse/HBASE-6572 > Project: HBase > Issue Type: Brainstorming > Reporter: Andrew Purtell > Priority: Major > > Consider how we might enable tiered HFile storage. If HDFS has the > capability, we could create certain files on solid state devices where they > might be frequently accessed, especially for random reads; and others (and by > default) on spinning media as before. We could support the move of frequently > read HFiles from spinning media to solid state. We already have CF statistics > for this, would only need to add requisite admin interface; could even > consider an autotiering option. > Dhruba Borthakur did some early work in this area and wrote up his findings: > http://hadoopblog.blogspot.com/2012/05/hadoop-and-solid-state-drives.html . > It is important to note the findings but I suggest most of the > recommendations are out of scope of this JIRA. This JIRA seeks to find an > initial use case that produces a reasonable benefit, and serves as a testbed > for further improvements. If I may paraphrase Dhruba's findings (any > misstatements and errors are mine): First, the DFSClient code paths introduce > significant latency, so the HDFS client (and presumably the DataNode, as the > next bottleneck) will need significant work to knock that down. Need to > investigate optimized (perhaps read-only) DFS clients, server side read and > caching strategies. Second, RegionServers are heavily threaded and this > imposes a lot of monitor contention and context switching cost. Need to > investigate reducing the number of threads in a RegionServer, nonblocking IO > and RPC. -- This message was sent by Atlassian JIRA (v7.6.3#76005)