[
https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040463#comment-14040463
]
Jingcheng Du commented on HBASE-11339:
--------------------------------------
Thanks [~jmhsieh] !
bq. 1) compact individual cf's without compacting others. 2)having different
compaction selection/promotion algorithms per cf.
Yes, this could improve the compaction. But this doesn't reduce the twice
writing for the mob file.
bq. 3) decided to split only based on certain cf's
We could split the region by a certain cf, but after all the cf of mob will be
split. Let's assume a metadata(description data for the mob, they're other cfs
than the mob cf) is 1KB and a mob is 5MB, when the region is split by the
metadata size, the mob data will be very very large. Saving the mob off from
the HBase could avoid this.
When scanning, the mob data is counted in the heap of scanners if saving the
mob in the HBase whereas the mob are directly sought in a single file each time
if saving them into mob files(We have a mechanism to cache several opened
scanners of the mob files). The latter one seems to be more efficient.
bq. How many lob files could be generated per flush? If I flush a table, would
all regions the relevant regions on a particular RS go to one lob sequence file
as opposed to many hfiles in the cf case? (e.g. similarly to how all edits on
an RS go to one hlog)
The files related with the mob are reference(path)HFile + mobFile. The amount
of the files is doubled than the one related with mob directly saving them into
HBase.
Saving the mob files by stores than by region server is more efficient to use
the TTL to clean the expired mobs.
bq. Even with the pdf design, we still end up flushing fairly frequently
(potentially a flush every ~100 objects!) and we'd end up with a lot of hfiles
or lob files.
The HFiles for metadata are supposed to be small, it's not so costly as the one
in mob files.
Usually the mob is much larger than the metadata, the mob files are large
enough when flushing. And because of the read against a single file, the amount
of the mob files won't impact the read performance.
bq. I don't think the pdf design mentions antything about caching mob values.
Would frequently requested mob always hit hdfs?
We have a MobCacheConfig which extends the CacheConfig for the each mob store,
it provides a cache for several opened mob files(only cache the opened reader,
the capacity is limited and , use LRU to evict them if the capacity is
exceeded.), and this cache had the same global block cache with the one in
region server. If saving the mob into HFile, the block cache works with mob
files as well.
> HBase MOB
> ---------
>
> Key: HBASE-11339
> URL: https://issues.apache.org/jira/browse/HBASE-11339
> Project: HBase
> Issue Type: New Feature
> Components: regionserver, Scanners
> Reporter: Jingcheng Du
> Assignee: Jingcheng Du
> Attachments: HBase LOB Design.pdf
>
>
> It's quite useful to save the medium binary data like images, documents
> into Apache HBase. Unfortunately directly saving the binary MOB(medium
> object) to HBase leads to a worse performance since the frequent split and
> compaction.
> In this design, the MOB data are stored in an more efficient way, which
> keeps a high write/read performance and guarantees the data consistency in
> Apache HBase.
--
This message was sent by Atlassian JIRA
(v6.2#6252)