[jira] [Commented] (HBASE-11339) HBase MOB

Jonathan Hsieh (JIRA) Tue, 24 Jun 2014 13:44:40 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042655#comment-14042655
 ]


Jonathan Hsieh commented on HBASE-11339:
----------------------------------------

In the pdf design, is there one MobManager per RS or one MobManager per table 
or one MobManager per region?  Is the mob hfiles kind of like a shared cf that 
all regions with mobs eventually throw their data into?   

Can you explain what happens if I have a RS with regions, some belonging to 
tableA and and some belonging to tableB.  Let's say all writes to tableA and 
tableB have Mobs in them. 

# a region gets full and decides to flush.  we generate one mob file.  10 
separate flushes, 10 separate mob files.
# an admin user issues a flush tableA command and there are multiple tableA 
regions on the rs.  How many mob files are generated?  one mob file per region 
in tableA on the rs? exactly one because only one table was flushed? exactly 
one because only one table was flushed?
# the node goes down cleanly, causing all regions to be flushed.  how many 
mobfiles are generated.  one mob file per region on the rs, one mob file per 
table on the rs, or exactly one because there is only one rs?

Where are the mob files written to?  are they in the region dir, the family 
dir, the table dir or something else? In 98, the dir structure is 
/hbase/<namespace>/<table>/<region>/<cf>/hfile.  Where do the mob files for 
region1 of tableA go and where does the mob files for region2 of tableB go to?

bq. Yes, this could improve the compaction. But this doesn't reduce the twice 
writing for the mob file.

Ok, so this is essentially equal -- both the pdf and the cf approach require a 
minimum of 2x.writes of mob data 

bq. Saving the mob files by stores than by region server is more efficient to 
use the TTL to clean the expired mobs.

With this It sounds like new mob file per region, and that mobs would still 
generate the same number of files as the separate cf's approach.

Can't we (or do we already) have the ttl optimization in our existing cf's 
since our hfiles have start and end ts in them?

... (i think I need to understand the answers to the first section before some 
of this makes sense to me.)






> HBase MOB
> ---------
>
>                 Key: HBASE-11339
>                 URL: https://issues.apache.org/jira/browse/HBASE-11339
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver, Scanners
>            Reporter: Jingcheng Du
>            Assignee: Jingcheng Du
>         Attachments: HBase LOB Design.pdf
>
>
>   It's quite useful to save the medium binary data like images, documents 
> into Apache HBase. Unfortunately directly saving the binary MOB(medium 
> object) to HBase leads to a worse performance since the frequent split and 
> compaction.
>   In this design, the MOB data are stored in an more efficient way, which 
> keeps a high write/read performance and guarantees the data consistency in 
> Apache HBase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11339) HBase MOB

Reply via email to