[ 
https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14608540#comment-14608540
 ] 

Matteo Bertozzi commented on HBASE-13991:
-----------------------------------------

instead of doing an incompatible change to workaround just this problem, 
we should look into what else can we solve by changing the fs layout.

some of the point of my list are:
 * avoid moving files around, tmp -> table region -> archive
 **  Avoid the hack “if file is not here, try there” of HFileLink
 * avoid rename() calls to simulate "transactions" (e.g. compaction, split, 
creation, deletion, ...)
 ** rename calls in some environment (e.g. s3) are full copies instead of just 
a metadata operation
 *  File sharing between different table without links “Clone Table”
 ** Simplify snapshot/restore reference code and avoid all the calls to 
fs.listStatus(), fs.createNew()
 ** avoid write permission required in MR over snapshots (for backlinks 
creation)

we should have a single /data dir where we place data, and then each table will 
point to that.
you'll avoid moving the file around (for tmp-creation/commit and archiving) and 
your data is not tight together with a table, allowing things like snapshots, 
clones and read-replicas to work without hack. and you'll also gain some future 
ability to do some kind of deduplication and better compaction logic.

if you look at the last slide of: 
https://issues.apache.org/jira/secure/attachment/12568749/HBASE-7806.pdf
there was a proposed layout, where you have this kind of separation.
you can store the list of files in meta as Stack mentioned, or you can have 
some manifest file containing the current state of the table (something like 
the SnapshotManifest 
https://github.com/apache/hbase/blob/master/hbase-protocol/src/main/protobuf/Snapshot.proto#L41).
 the point is, do not tight together the data with the logical placement of  
table/regions and have an atomic operation for when you add/remove files. think 
about features like snapshot and replicas where the files are not owned only by 
one region.

> Hierarchical Layout for Humongous Tables
> ----------------------------------------
>
>                 Key: HBASE-13991
>                 URL: https://issues.apache.org/jira/browse/HBASE-13991
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Ben Lau
>            Assignee: Ben Lau
>         Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf
>
>
> Add support for humongous tables via a hierarchical layout for regions on 
> filesystem.  
> Credit for most of this code goes to Huaiyu Zhu.  
> Latest version of the patch is available on the review board: 
> https://reviews.apache.org/r/36029/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to