[
https://issues.apache.org/jira/browse/HBASE-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872774#comment-15872774
]
Umesh Agashe edited comment on HBASE-14090 at 2/18/17 12:06 AM:
----------------------------------------------------------------
Sometime back, we here at Cloudera had discussion about our effort on this
issue. We talked about our status of our efforts, findings, experiments and
concluded with need for a new approach to solve this issue. This doc summarizes
the discussion. Please see the link to the doc: "Discussion on new radically
different approach to HBase FS directory layout REDO work".
was (Author: uagashe):
Sometime back, we here at Cloudera had discussion about our effort on this
issue. We talked about our status of our efforts, findings, experiments and
concluded with need for a new approach to solve this issue. This doc summarizes
the discussion.
> Redo FS layout; let go of tables/regions/stores directory hierarchy in DFS
> --------------------------------------------------------------------------
>
> Key: HBASE-14090
> URL: https://issues.apache.org/jira/browse/HBASE-14090
> Project: HBase
> Issue Type: Sub-task
> Reporter: stack
> Assignee: Sean Busbey
>
> Our layout as is won't work if 1M regions; e.g. HDFS will fall over if
> directories of hundreds of thousands of files. HBASE-13991 (Humongous Tables)
> would address this specific directory problem only by adding subdirs under
> table dir but there are other issues with our current layout:
> * Our table/regions/column family 'facade' has to be maintained in two
> locations -- in master memory and in the hdfs directory layout -- and the
> farce needs to be kept synced or worse, the model management is split between
> master memory and DFS layout. 'Syncing' in HDFS has us dropping constructs
> such as 'Reference' and 'HalfHFiles' on split, 'HFileLinks' when archiving,
> and so on. This 'tie' makes it hard to make changes.
> * While HDFS has atomic rename, useful for fencing and for having files
> added atomically, if the model were solely owned by hbase, there are hbase
> primitives we could make use of -- changes in a row are atomic and
> coprocessors -- to simplify table transactions and provide more consistent
> views of our model to clients; file 'moves' could be a memory operation only
> rather than an HDFS call; sharing files between tables/snapshots and when it
> is safe to remove them would be simplified if one owner only; and so on.
> This is an umbrella blue-sky issue to discuss what a new layout would look
> like and how we might get there. I'll follow up with some sketches of what
> new layout could look like that come of some chats a few of us have been
> having. We are also under the 'delusion' that move to a new layout could be
> done as part of a rolling upgrade and that the amount of work involved is not
> gargantuan.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)