[ 
https://issues.apache.org/jira/browse/HBASE-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13597431#comment-13597431
 ] 

Enis Soztutar commented on HBASE-7949:
--------------------------------------

1. during compaction, the data is read from small files and writing to a 
combined new large file.
But you still want compactions, but maybe not as frequent. Can't this be 
achieved with having the same compaction policy, but with different tuning? 
Since you still want to get rid of deleted LOB's and merge small files, you 
should not disable compactions, but maybe we need a LOB optimized compaction 
policy.
2. during split, the data is read from the parent region storefiles and written 
into two daughter regions' storefiles.
No, during split, we do not rewrite the data, but only create reference files. 
Split triggers a compaction, which happens later. See 
http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/ 

like for a get operation, we need to compare against too many bloomfilters for 
each storefile to locate our record; and for a scan operation, we need to 
perform seek in all these storefiles.... The idea is large content data are 
very probably loaded once and not frequently modified, there is literally no 
need to move or merge the data all the time, as would happen in normal region 
compactions and splittings, and in order to maintain region independence and 
read efficiency
It seems what we need here is to have possible large number of files and do 
less compactions, but have those files in range partitioned within the region. 
I highly recommend to take a look at stripe compactions / level compactions 
issue (HBASE-7667). 
                
> Enable big content store in HBase
> ---------------------------------
>
>                 Key: HBASE-7949
>                 URL: https://issues.apache.org/jira/browse/HBASE-7949
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: chenning
>         Attachments: HBase_LOB.pdf
>
>
> Big content stored in hbase consumes a lot of system resource when region 
> split or compaction operation happens.
> How HBase can be used to store big content along with some self descriptive 
> meta-data. 
> The general idea is to add a new type of column family, and the content of 
> this kind of column family doesn't participate the region split and 
> compaction. An index(rowkey-location) is introduced in this new column family 
> and the split and compaction are only applied to this index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to