[
https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039195#comment-14039195
]
Jonathan Hsieh commented on HBASE-11339:
----------------------------------------
bq. Does it mean the mob files are not feasibe?
I'm trying to be convinced that we need a special mechanism to handle MOBs. We
can put the loblog idea to rest for the time being because of the read-recently
written issues.
Let's see if improving the cf flushes/compactions could achieve the same goal
as the pdf.
bq. You mean directly saving the mob into HBase and using different compaction
policy for the mob cf? The compaction on the mob cf in HBase is costly, will
probably delay the flushing and block the updates. And a large mob store leads
to frequent region split. All of these impact the HBase potentially.
Yes roughly.
With the algorithms today sure. However, I was thinking a few things that we
could use to avoid excessive write amplification.
1) compact individual cf's without compacting others.
2) having different compaction selection/promotion algorithms per cf.
3) decided to split only based on certain cf's
Even with the pdf design, we still end up flushing fairly frequently
(potentially a flush every ~100 objects!) and we'd end up with a lot of hfiles
or lob files.
How many lob files could be generated per flush? If I flush a table, would
all regions the relevant regions on a particular RS go to one lob sequence file
as opposed to many hfiles in the cf case? (e.g. similarly to how all edits on
an RS go to one hlog)
I don't think the pdf design mentions antything about caching mob values.
Would frequently requested mob always hit hdfs?
bq. In the current design (introduced in the pdf), if users are concerned for
the write performance rather than the consistency and replication, how about to
disable the WAL directly? If users want to enable the WAL and don't want the
twice writing, they could write the mob in the client side ( the way like
Lars's suggestion). The scanner and sweep tool could work as well with this if
the locator(reference) column follows the specific format.
Interesting point but the obvious problem is we lose durability guarantees and
isn't something we can really recommend for normal use. (in the lob log idea
seems pretty obvious that we'd be able to maintain durability guarantees).
> HBase MOB
> ---------
>
> Key: HBASE-11339
> URL: https://issues.apache.org/jira/browse/HBASE-11339
> Project: HBase
> Issue Type: New Feature
> Components: regionserver, Scanners
> Reporter: Jingcheng Du
> Assignee: Jingcheng Du
> Attachments: HBase LOB Design.pdf
>
>
> It's quite useful to save the medium binary data like images, documents
> into Apache HBase. Unfortunately directly saving the binary MOB(medium
> object) to HBase leads to a worse performance since the frequent split and
> compaction.
> In this design, the MOB data are stored in an more efficient way, which
> keeps a high write/read performance and guarantees the data consistency in
> Apache HBase.
--
This message was sent by Atlassian JIRA
(v6.2#6252)