stack created HBASE-14090:
-----------------------------
Summary: Redo FS layout; let go of tables/regions/stores directory
hierarchy in DFS
Key: HBASE-14090
URL: https://issues.apache.org/jira/browse/HBASE-14090
Project: HBase
Issue Type: Sub-task
Reporter: stack
Our layout as is won't work if 1M regions; e.g. HDFS will fall over if
directories of hundreds of thousands of files. HBASE-13991 (Humongous Tables)
would address this specific directory problem only by adding subdirs under
table dir but there are other issues with our current layout:
* Our table/regions/column family 'facade' has to be maintained in two
locations -- in master memory and in the hdfs directory layout -- and the farce
needs to be kept synced or worse, the model management is split between master
memory and DFS layout. 'Syncing' in HDFS has us dropping constructs such as
'Reference' and 'HalfHFiles' on split, 'HFileLinks' when archiving, and so on.
This 'tie' makes it hard to make changes.
* While HDFS has atomic rename, useful for fencing and for having files added
atomically, if the model were solely owned by hbase, there are hbase primitives
we could make use of -- changes in a row are atomic and coprocessors -- to
simplify table transactions and provide more consistent views of our model to
clients; file 'moves' could be a memory operation only rather than an HDFS
call; sharing files between tables/snapshots and when it is safe to remove them
would be simplified if one owner only; and so on.
This is an umbrella blue-sky issue to discuss what a new layout would look like
and how we might get there. I'll follow up with some sketches of what new
layout could look like that come of some chats a few of us have been having. We
are also under the 'delusion' that move to a new layout could be done as part
of a rolling upgrade and that the amount of work involved is not gargantuan.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)