[ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032254#comment-14032254 ]
Jingcheng Du commented on HBASE-11339: -------------------------------------- Thanks [~lhofhansl] for the comments. > Is it better to store small blobs (let's say 1mb or less) in HBase (by value) > and larger blob directly in files in HDFS with just a reference in HBase? > Writing large blobs would be a three step process: (1) add the metadata to > HBase (2) stream the actual blob into HDFS (3) set a "written" column in the > HBase row to true. Good idea. But In this way, all the actions occurs in the client, each client writes a new file in HDFS. It's hard to control the file size which consequently leads to too many small files in HDFS probably. > HBase LOB > --------- > > Key: HBASE-11339 > URL: https://issues.apache.org/jira/browse/HBASE-11339 > Project: HBase > Issue Type: New Feature > Components: regionserver, Scanners > Reporter: Jingcheng Du > Assignee: Jingcheng Du > Attachments: HBase LOB Design.pdf > > > It's quite useful to save the massive binary data like images, documents > into Apache HBase. Unfortunately directly saving the binary LOB(large object) > to HBase leads to a worse performance since the frequent split and compaction. > In this design, the LOB data are stored in an more efficient way, which > keeps a high write/read performance and guarantees the data consistency in > Apache HBase. -- This message was sent by Atlassian JIRA (v6.2#6252)