[ 
https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042990#comment-14042990
 ] 

Jingcheng Du commented on HBASE-11339:
--------------------------------------

Resend it to correct the format.
bq. In the pdf design, is there one MobManager per RS or one MobManager per 
table or one MobManager per region? Is the mob hfiles kind of like a shared cf 
that all regions with mobs eventually throw their data into?
The MobManager is per region server, it maintain the mapping between the 
(tableName,cfName) to mob cf.
The mob files are saved in the <i>mobRootDir / tableNameAsString / cfName / 
date / mobFiles</i>.
1.  A mob file is generated per MemStore flushing.
2.  All the mob files for all regions in a single table of a region server are 
saved into the same directory <i>mobRootDir / tableNameAsString / cfName /  
date</i>.
The greatest advantage is using the TTL to clean the whole date directory in 
one cf.

bq. Can you explain what happens if I have a RS with regions, some belonging to 
tableA and and some belonging to tableB. Let's say all writes to tableA and 
tableB have Mobs in them.
The mob files are save in the <i>mobRootDir / tableNameAsString / cfName / date 
/ mobFiles</i>. So each mob cf should have its own mob file, one new mob file 
is generated for each cf when a region flushes.
1. The mob files for tableA and tableB are saved into different directories. 
The ones for tableA are saved into <i> mobRootDir / tableAAsString / cfName / 
date / mobFiles</i>, and the ones for tableB are saved into <i>mobRootDir / 
tableBAsString / cfName / data / mobFiles</i>.
2. Per flushing, a new mob file is generated for each cf, the one for tableA is 
<i>mobRootDir / tableBAsString / cf1 / data/ aNewMobFileForTableACf1</i>, the 
one for tableB is <i>mobRootDir / tableBAsString / cf2 / data / 
aNewMobFileForTableBCf2</i>.

bq. With this It sounds like new mob file per region, and that mobs would still 
generate the same number of files as the separate cf's approach.
Can't we (or do we already) have the ttl optimization in our existing cf's 
since our hfiles have start and end ts in them?
The mob files are saved by table/cf instead of table/region/cf.
If saving the mob into HBase directly, the writing when splitting the mob store 
are not avoided even if we split the regions by certain cfs.
If getting the end ts by the last key in the HFile, we have to read all the 
HFile to know whether it's expired. In the pdf, we check it by directories 
which needs less read.

> HBase MOB
> ---------
>
>                 Key: HBASE-11339
>                 URL: https://issues.apache.org/jira/browse/HBASE-11339
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver, Scanners
>            Reporter: Jingcheng Du
>            Assignee: Jingcheng Du
>         Attachments: HBase LOB Design.pdf
>
>
>   It's quite useful to save the medium binary data like images, documents 
> into Apache HBase. Unfortunately directly saving the binary MOB(medium 
> object) to HBase leads to a worse performance since the frequent split and 
> compaction.
>   In this design, the MOB data are stored in an more efficient way, which 
> keeps a high write/read performance and guarantees the data consistency in 
> Apache HBase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to