[ 
https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039195#comment-14039195
 ] 

Jonathan Hsieh commented on HBASE-11339:
----------------------------------------

bq. Does it mean the mob files are not feasibe?

I'm trying to be convinced that we need a special mechanism to handle MOBs.  We 
can put the loblog idea to rest for the time being because of the read-recently 
written issues.

Let's see if improving the cf flushes/compactions could achieve the same goal 
as the pdf.

bq. You mean directly saving the mob into HBase and using different compaction 
policy for the mob cf? The compaction on the mob cf in HBase is costly, will 
probably delay the flushing and block the updates. And a large mob store leads 
to frequent region split. All of these impact the HBase potentially.

Yes roughly. 

With the algorithms today sure.  However, I was thinking a few things that we 
could use to avoid excessive write amplification.
1) compact individual cf's without compacting others.
2) having different compaction selection/promotion algorithms per cf.
3) decided to split only based on certain cf's

Even with the pdf design, we still end up flushing fairly frequently 
(potentially a flush every ~100 objects!) and we'd end up with a lot of hfiles 
or lob files.  

How many lob files could be generated per flush?  If I flush a table, would  
all regions the relevant regions on a particular RS go to one lob sequence file 
as opposed to many hfiles in the cf case?   (e.g. similarly to how all edits on 
an RS go to one hlog) 

I don't think the pdf design mentions antything about caching mob values.  
Would frequently requested mob always hit hdfs?  

bq. In the current design (introduced in the pdf), if users are concerned for 
the write performance rather than the consistency and replication, how about to 
disable the WAL directly? If users want to enable the WAL and don't want the 
twice writing, they could write the mob in the client side ( the way like 
Lars's suggestion). The scanner and sweep tool could work as well with this if 
the locator(reference) column follows the specific format.

Interesting point but the obvious problem is we lose durability guarantees and 
isn't something we can really recommend for normal use.  (in the lob log idea 
seems pretty obvious that we'd be able to maintain durability guarantees).


> HBase MOB
> ---------
>
>                 Key: HBASE-11339
>                 URL: https://issues.apache.org/jira/browse/HBASE-11339
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver, Scanners
>            Reporter: Jingcheng Du
>            Assignee: Jingcheng Du
>         Attachments: HBase LOB Design.pdf
>
>
>   It's quite useful to save the medium binary data like images, documents 
> into Apache HBase. Unfortunately directly saving the binary MOB(medium 
> object) to HBase leads to a worse performance since the frequent split and 
> compaction.
>   In this design, the MOB data are stored in an more efficient way, which 
> keeps a high write/read performance and guarantees the data consistency in 
> Apache HBase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to