[
https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042990#comment-14042990
]
Jingcheng Du commented on HBASE-11339:
--------------------------------------
Resend it to correct the format.
bq. In the pdf design, is there one MobManager per RS or one MobManager per
table or one MobManager per region? Is the mob hfiles kind of like a shared cf
that all regions with mobs eventually throw their data into?
The MobManager is per region server, it maintain the mapping between the
(tableName,cfName) to mob cf.
The mob files are saved in the <i>mobRootDir / tableNameAsString / cfName /
date / mobFiles</i>.
1. A mob file is generated per MemStore flushing.
2. All the mob files for all regions in a single table of a region server are
saved into the same directory <i>mobRootDir / tableNameAsString / cfName /
date</i>.
The greatest advantage is using the TTL to clean the whole date directory in
one cf.
bq. Can you explain what happens if I have a RS with regions, some belonging to
tableA and and some belonging to tableB. Let's say all writes to tableA and
tableB have Mobs in them.
The mob files are save in the <i>mobRootDir / tableNameAsString / cfName / date
/ mobFiles</i>. So each mob cf should have its own mob file, one new mob file
is generated for each cf when a region flushes.
1. The mob files for tableA and tableB are saved into different directories.
The ones for tableA are saved into <i> mobRootDir / tableAAsString / cfName /
date / mobFiles</i>, and the ones for tableB are saved into <i>mobRootDir /
tableBAsString / cfName / data / mobFiles</i>.
2. Per flushing, a new mob file is generated for each cf, the one for tableA is
<i>mobRootDir / tableBAsString / cf1 / data/ aNewMobFileForTableACf1</i>, the
one for tableB is <i>mobRootDir / tableBAsString / cf2 / data /
aNewMobFileForTableBCf2</i>.
bq. With this It sounds like new mob file per region, and that mobs would still
generate the same number of files as the separate cf's approach.
Can't we (or do we already) have the ttl optimization in our existing cf's
since our hfiles have start and end ts in them?
The mob files are saved by table/cf instead of table/region/cf.
If saving the mob into HBase directly, the writing when splitting the mob store
are not avoided even if we split the regions by certain cfs.
If getting the end ts by the last key in the HFile, we have to read all the
HFile to know whether it's expired. In the pdf, we check it by directories
which needs less read.
> HBase MOB
> ---------
>
> Key: HBASE-11339
> URL: https://issues.apache.org/jira/browse/HBASE-11339
> Project: HBase
> Issue Type: New Feature
> Components: regionserver, Scanners
> Reporter: Jingcheng Du
> Assignee: Jingcheng Du
> Attachments: HBase LOB Design.pdf
>
>
> It's quite useful to save the medium binary data like images, documents
> into Apache HBase. Unfortunately directly saving the binary MOB(medium
> object) to HBase leads to a worse performance since the frequent split and
> compaction.
> In this design, the MOB data are stored in an more efficient way, which
> keeps a high write/read performance and guarantees the data consistency in
> Apache HBase.
--
This message was sent by Atlassian JIRA
(v6.2#6252)