[
https://issues.apache.org/jira/browse/HBASE-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13596880#comment-13596880
]
Maryann Xue commented on HBASE-7949:
------------------------------------
@Enis well, the constant reading and writing of the same set of large content
data happens in two ways: compaction and split.
1. during compaction, the data is read from small files and writing to a
combined new large file.
2. during split, the data is read from the parent region storefiles and written
into two daughter regions' storefiles.
to avoid I/O overhead caused by 1 (compaction), we can disable minor compaction
for this family, but this would lead to another big problem: bad get/scan
performance. like for a get operation, we need to compare against too many
bloomfilters for each storefile to locate our record; and for a scan operation,
we need to perform seek in all these storefiles. the performance decline of
"Get" throughput with the storefile number increase is shown in the slides.
to avoid I/O overhead caused by 2 (split), we can have pre-split regions for a
table, but this cannot always be done for customer use-cases.
The idea is large content data are very probably loaded once and not frequently
modified, there is literally no need to move or merge the data all the time, as
would happen in normal region compactions and splittings, and in order to
maintain region independence and read efficiency.
so having a storage independent of hbase regions would make sense for such
use-cases, and meanwhile we leverage the major compaction process to do cleanup
and merge at a reasonable frequency level -- only perform merge when a certain
file has exceeded the configured threshold.
> Enable big content store in HBase
> ---------------------------------
>
> Key: HBASE-7949
> URL: https://issues.apache.org/jira/browse/HBASE-7949
> Project: HBase
> Issue Type: Brainstorming
> Reporter: chenning
> Attachments: HBase_LOB.pdf
>
>
> Big content stored in hbase consumes a lot of system resource when region
> split or compaction operation happens.
> How HBase can be used to store big content along with some self descriptive
> meta-data.
> The general idea is to add a new type of column family, and the content of
> this kind of column family doesn't participate the region split and
> compaction. An index(rowkey-location) is introduced in this new column family
> and the split and compaction are only applied to this index.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira