[jira] [Commented] (HBASE-11339) HBase MOB

Jingcheng Du (JIRA) Sun, 22 Jun 2014 23:45:08 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040463#comment-14040463
 ]


Jingcheng Du commented on HBASE-11339:
--------------------------------------

Thanks [~jmhsieh] !

bq. 1) compact individual cf's without compacting others. 2)having different 
compaction selection/promotion algorithms per cf.
Yes, this could improve the compaction. But this doesn't reduce the twice 
writing for the mob file.

bq. 3) decided to split only based on certain cf's
We could split the region by a certain cf, but after all the cf of mob will be 
split. Let's assume a metadata(description data for the mob, they're other cfs 
than the mob cf) is 1KB and a mob is 5MB, when the region is split by the 
metadata size, the mob data will be very very large. Saving the mob off from 
the HBase could avoid this. 
When scanning, the mob data is counted in the heap of scanners if saving the 
mob in the HBase whereas the mob are directly sought in a single file each time 
if saving them into mob files(We have a mechanism to cache several opened 
scanners of the mob files). The latter one seems to be more efficient.

bq. How many lob files could be generated per flush? If I flush a table, would 
all regions the relevant regions on a particular RS go to one lob sequence file 
as opposed to many hfiles in the cf case? (e.g. similarly to how all edits on 
an RS go to one hlog)
The files related with the mob are reference(path)HFile + mobFile. The amount 
of the files is doubled than the one related with mob directly saving them into 
HBase.
Saving the mob files by stores than by region server is more efficient to use 
the TTL to clean the expired mobs.

bq. Even with the pdf design, we still end up flushing fairly frequently 
(potentially a flush every ~100 objects!) and we'd end up with a lot of hfiles 
or lob files.
The HFiles for metadata are supposed to be small, it's not so costly as the one 
in mob files.
Usually the mob is much larger than the metadata, the mob files are large 
enough when flushing. And because of the read against a single file, the amount 
of the mob files won't impact the read performance.

bq. I don't think the pdf design mentions antything about caching mob values. 
Would frequently requested mob always hit hdfs?
We have a MobCacheConfig which extends the CacheConfig for the each mob store, 
it provides a cache for several opened mob files(only cache the opened reader, 
the capacity is limited and , use LRU to evict them if the capacity is 
exceeded.), and this cache had the same global block cache with the one in 
region server. If saving the mob into HFile, the block cache works with mob 
files as well.


> HBase MOB
> ---------
>
>                 Key: HBASE-11339
>                 URL: https://issues.apache.org/jira/browse/HBASE-11339
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver, Scanners
>            Reporter: Jingcheng Du
>            Assignee: Jingcheng Du
>         Attachments: HBase LOB Design.pdf
>
>
>   It's quite useful to save the medium binary data like images, documents 
> into Apache HBase. Unfortunately directly saving the binary MOB(medium 
> object) to HBase leads to a worse performance since the frequent split and 
> compaction.
>   In this design, the MOB data are stored in an more efficient way, which 
> keeps a high write/read performance and guarantees the data consistency in 
> Apache HBase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11339) HBase MOB

Reply via email to