[ 
https://issues.apache.org/jira/browse/HBASE-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13598642#comment-13598642
 ] 

Maryann Xue commented on HBASE-7949:
------------------------------------

@chenning, as enis has clarified, the actual data move does not happen on the 
split point. instead, it happens in later compactions. and in the approach we 
proposed, the LOB family does not participate in split or minor compactions at 
all.

@enis, the problem is not when the read and write happens, it is more of the 
unnecessary I/O overhead in splitting. and if the data is seldom updated, why 
compact them (for split) anyway?

yes, utilizing level compactions could be a good approach. still, our approach 
can have three advantages over level compaction: 
1. i/o overhead by split and minor compactions are completely eliminated; 
2. clean-up is only done for those file that has reached a certain level of 
invalidation rate, during major compactions;
3. not every file reader is instantiated and kept in regionserver memory. 
instead, we'll have an LRU cache for frequently read LOB files.

however, i suggest this issue not be committed into HBase trunk. instead we'd 
like to make the implementation a use case over HBase. and the only facility we 
need in HBase trunk is a pluggable flush process HBASE-8024.
                
> Enable big content store in HBase
> ---------------------------------
>
>                 Key: HBASE-7949
>                 URL: https://issues.apache.org/jira/browse/HBASE-7949
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: chenning
>         Attachments: HBase_LOB.pdf
>
>
> Big content stored in hbase consumes a lot of system resource when region 
> split or compaction operation happens.
> How HBase can be used to store big content along with some self descriptive 
> meta-data. 
> The general idea is to add a new type of column family, and the content of 
> this kind of column family doesn't participate the region split and 
> compaction. An index(rowkey-location) is introduced in this new column family 
> and the split and compaction are only applied to this index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to