[ 
https://issues.apache.org/jira/browse/HBASE-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13596880#comment-13596880
 ] 

Maryann Xue commented on HBASE-7949:
------------------------------------

@Enis well, the constant reading and writing of the same set of large content 
data happens in two ways: compaction and split.
1. during compaction, the data is read from small files and writing to a 
combined new large file.
2. during split, the data is read from the parent region storefiles and written 
into two daughter regions' storefiles.

to avoid I/O overhead caused by 1 (compaction), we can disable minor compaction 
for this family, but this would lead to another big problem: bad get/scan 
performance. like for a get operation, we need to compare against too many 
bloomfilters for each storefile to locate our record; and for a scan operation, 
we need to perform seek in all these storefiles. the performance decline of 
"Get" throughput with the storefile number increase is shown in the slides.

to avoid I/O overhead caused by 2 (split), we can have pre-split regions for a 
table, but this cannot always be done for customer use-cases.

The idea is large content data are very probably loaded once and not frequently 
modified, there is literally no need to move or merge the data all the time, as 
would happen in normal region compactions and splittings, and in order to 
maintain region independence and read efficiency.
so having a storage independent of hbase regions would make sense for such 
use-cases, and meanwhile we leverage the major compaction process to do cleanup 
and merge at a reasonable frequency level -- only perform merge when a certain 
file has exceeded the configured threshold.
                
> Enable big content store in HBase
> ---------------------------------
>
>                 Key: HBASE-7949
>                 URL: https://issues.apache.org/jira/browse/HBASE-7949
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: chenning
>         Attachments: HBase_LOB.pdf
>
>
> Big content stored in hbase consumes a lot of system resource when region 
> split or compaction operation happens.
> How HBase can be used to store big content along with some self descriptive 
> meta-data. 
> The general idea is to add a new type of column family, and the content of 
> this kind of column family doesn't participate the region split and 
> compaction. An index(rowkey-location) is introduced in this new column family 
> and the split and compaction are only applied to this index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to