[
https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032254#comment-14032254
]
Jingcheng Du commented on HBASE-11339:
--------------------------------------
Thanks [~lhofhansl] for the comments.
> Is it better to store small blobs (let's say 1mb or less) in HBase (by value)
> and larger blob directly in files in HDFS with just a reference in HBase?
> Writing large blobs would be a three step process: (1) add the metadata to
> HBase (2) stream the actual blob into HDFS (3) set a "written" column in the
> HBase row to true.
Good idea. But In this way, all the actions occurs in the client, each client
writes a new file in HDFS. It's hard to control the file size which
consequently leads to too many small files in HDFS probably.
> HBase LOB
> ---------
>
> Key: HBASE-11339
> URL: https://issues.apache.org/jira/browse/HBASE-11339
> Project: HBase
> Issue Type: New Feature
> Components: regionserver, Scanners
> Reporter: Jingcheng Du
> Assignee: Jingcheng Du
> Attachments: HBase LOB Design.pdf
>
>
> It's quite useful to save the massive binary data like images, documents
> into Apache HBase. Unfortunately directly saving the binary LOB(large object)
> to HBase leads to a worse performance since the frequent split and compaction.
> In this design, the LOB data are stored in an more efficient way, which
> keeps a high write/read performance and guarantees the data consistency in
> Apache HBase.
--
This message was sent by Atlassian JIRA
(v6.2#6252)