[ 
https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117774#comment-14117774
 ] 

Lars Hofhansl commented on HBASE-11339:
---------------------------------------

[[email protected]] and I talked about this at the HBase meetup...

I'm sorry to be the party pooper here, but this complexity and functionality 
really does not belong into HBase IMHO.
I still do not get the motivation for this... Here's why:
# We still cannot stream the mobs. They have to be materialized at both the 
server and the client (going by the documentation here)
# As I state above this can be achieved with a HBase/HDFS client alone and 
better: Store mobs up to a certain size by value in HBase (say 5 or 10mb or 
so), everything larger goes straight into HDFS with a reference only in HBase. 
This addresses both the many small files issue in HDFS (only files larger than 
5-10mb would end up in HDFS) and the streaming problem for large files in 
HBase. Also as outlined by me in June we can still make this "transactional" in 
the HBase sense with a three step protocol: (1) write reference row, (2) stream 
blob to HDFS, (3) record location in HDFS (that's the commit). This solution is 
also missing from the initial PDF in the "Existing Solutions" section.
# "Replication" here can still happen by the client, after all, each file 
successfully stored in HDFS has a reference in HBase.
# We should use the tools what they were intended for. HBase for key value 
storage, HDFS for streaming large blobs.
# Just saying using one client API for client convenience is *not* a reason to 
put all of this into HBase. A client can easily speak both HBase and HDFS 
protocols.
# (Subjectively) I do not like the complexity of this as seen by the various 
discussions here. That part is just my $0.02 of course.

This looks to me like solution to a problem that we do not have.

Again I am sorry about being negative here, but we have to be careful what we 
put into HBase and for what reasons.

Especially when there seems to be a *better* client only solution (in the sense 
that it can deal with larger files, and allows for streaming the larger files).

If we need a solution for this, let's build one on top of HBase/HDFS. We 
(Salesforce) are actually building a client only solution for this, it's not 
that difficult (I will see whether we can open source this - it might be too 
entangled with our internals). With an easy protocol we can still allow data 
locality for all blob reads (as much as the block distribution allows it at 
least), etc.
[~jesse_yates], maybe you want to add here?

If we cannot store 10mb Cells in HBase then that's something to address. The 
fact that we cannot stream into and out of HBase needs to be addressed, that is 
the real problem anyway.


> HBase MOB
> ---------
>
>                 Key: HBASE-11339
>                 URL: https://issues.apache.org/jira/browse/HBASE-11339
>             Project: HBase
>          Issue Type: Umbrella
>          Components: regionserver, Scanners
>            Reporter: Jingcheng Du
>            Assignee: Jingcheng Du
>         Attachments: HBase MOB Design-v2.pdf, HBase MOB Design-v3.pdf, HBase 
> MOB Design-v4.pdf, HBase MOB Design.pdf, MOB user guide.docx, MOB user 
> guide_v2.docx, hbase-11339-in-dev.patch
>
>
>   It's quite useful to save the medium binary data like images, documents 
> into Apache HBase. Unfortunately directly saving the binary MOB(medium 
> object) to HBase leads to a worse performance since the frequent split and 
> compaction.
>   In this design, the MOB data are stored in an more efficient way, which 
> keeps a high write/read performance and guarantees the data consistency in 
> Apache HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to