[jira] [Commented] (HBASE-11339) HBase MOB

Lars Hofhansl (JIRA) Wed, 03 Sep 2014 00:16:29 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14119489#comment-14119489
 ]


Lars Hofhansl commented on HBASE-11339:
---------------------------------------

bq. Back in June, JingCheng's response to your comments never got feedback on 
how you'd manage the small files problem.
To be fair, my comment itself addressed that by saying small blobs are stored 
by *value* in HBase, and only large bloba in HDFS. We can store a lot of 10MB 
(in the worst case scenario it's 200m x 10mb = 2pb) in HDFS, if that's not 
enough, we can dial up the threshold.

It seems nobody understood what I am suggesting. Depending on use case and data 
distribution you pick a threshold X. Blobs with a size of < X are stored 
directly in HBase as a column value. Blobs >= X are stored in a HDFS with a 
reference in HBase using the 3-phase approach.

bq. there are two HDFS blob + HBase metadata solutions are explicitly mentioned 
in section 4.1.2 (v4 design doc) with pros and cons
True, but as I state the "store small blobs by value and only large ones by 
reference" solution is not mentioned in there.

bq. The solution you propose is actually the first described hdfs+hbase approach
Not it's not... It says either all blobs go into HBase or all blobs go into 
HDFS... See above. Small blobs would be stored directly in HBase, not in HDFS. 
That's key, nobody wants to store 100k or 1mb files directly in HDFS.

bq. We have total 3 +1s for that Jira after many rounds of review rework. Can 
get it committed tomorrow IST unless objections...?
We won't get this committed until we finish this discussion. So consider this 
my -1 until we finish.

Going by the comments the use case is only 1-5mb files (definitely less than 
64mb), correct? That changes the discussion, but it looks to me that now the 
use case is limited to a single scenario and carefully constructed (200m x 500k 
files) so that this change might be useful. I.e. pick a blob size just right, 
and pick the size distribution of the files just right and this makes sense.

In my approach one can dial up/down the threshold of by-value and by-reference 
storage as needed. And I did not even realize the need for M/R.

I do agree with all of following:
* snapshots are harder
* bulk load is harder
* backup/restore/replication is harder

Yet, all that is possible to do with a client only solution and could be 
abstracted there.

I'll also admit that our blob storage tool is not finished, yet, and that for 
its use case we don't need replication or backup as it itself will be the 
backup solution for another very large data store.

Are you guys absolutely... 100%... positive that this cannot be done in any 
other way and has to be done this way? That we cannot store files up to a 
certain size as values in HBase and larger files in HDFS? And there is not good 
threshold value for this?


> HBase MOB
> ---------
>
>                 Key: HBASE-11339
>                 URL: https://issues.apache.org/jira/browse/HBASE-11339
>             Project: HBase
>          Issue Type: Umbrella
>          Components: regionserver, Scanners
>            Reporter: Jingcheng Du
>            Assignee: Jingcheng Du
>         Attachments: HBase MOB Design-v2.pdf, HBase MOB Design-v3.pdf, HBase 
> MOB Design-v4.pdf, HBase MOB Design.pdf, MOB user guide.docx, MOB user 
> guide_v2.docx, hbase-11339-in-dev.patch
>
>
>   It's quite useful to save the medium binary data like images, documents 
> into Apache HBase. Unfortunately directly saving the binary MOB(medium 
> object) to HBase leads to a worse performance since the frequent split and 
> compaction.
>   In this design, the MOB data are stored in an more efficient way, which 
> keeps a high write/read performance and guarantees the data consistency in 
> Apache HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-11339) HBase MOB

Reply via email to