[ 
https://issues.apache.org/jira/browse/HBASE-14090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629662#comment-14629662
 ] 

Matteo Bertozzi commented on HBASE-14090:
-----------------------------------------

{quote}We could go as far as having HBase manage a block pool directly{quote}
[~lhofhansl] I'm with you on this, as I was mentioning on HBASE-7806. if we go 
with block we can gain even more stuff like smarter compactions (instead of 
rewriting everything, replace just the blocks that require modification). 
deduplication when we do stuff like CopyTable or we may just have two tables 
with some data in common, placements and so on.
but switching to block will probably be too much work, we were trying to think 
how to split this stuff in intermediate steps and just changing the layout and 
add files refs in meta seems to be an unsplittable giant step. but at least the 
proposed stuff was designed with blocks in mind, so we can go there at some 
point (unless there is a big push to do it now).

> Redo FS layout; let go of tables/regions/stores directory hierarchy in DFS
> --------------------------------------------------------------------------
>
>                 Key: HBASE-14090
>                 URL: https://issues.apache.org/jira/browse/HBASE-14090
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: stack
>
> Our layout as is won't work if 1M regions; e.g. HDFS will fall over if 
> directories of hundreds of thousands of files. HBASE-13991 (Humongous Tables) 
> would address this specific directory problem only by adding subdirs under 
> table dir but there are other issues with our current layout:
>  * Our table/regions/column family 'facade' has to be maintained in two 
> locations -- in master memory and in the hdfs directory layout -- and the 
> farce needs to be kept synced or worse, the model management is split between 
> master memory and DFS layout. 'Syncing' in HDFS has us dropping constructs 
> such as 'Reference' and 'HalfHFiles' on split, 'HFileLinks' when archiving, 
> and so on. This 'tie' makes it hard to make changes.
>  * While HDFS has atomic rename, useful for fencing and for having files 
> added atomically, if the model were solely owned by hbase, there are hbase 
> primitives we could make use of -- changes in a row are atomic and 
> coprocessors -- to simplify table transactions and provide more consistent 
> views of our model to clients; file 'moves' could be a memory operation only 
> rather than an HDFS call; sharing files between tables/snapshots and when it 
> is safe to remove them would be simplified if one owner only; and so on.
> This is an umbrella blue-sky issue to discuss what a new layout would look 
> like and how we might get there. I'll follow up with some sketches of what 
> new layout could look like that come of some chats a few of us have been 
> having. We are also under the 'delusion' that move to a new layout could be 
> done as part of a rolling upgrade and that the amount of work involved is not 
> gargantuan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to