[
https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14035526#comment-14035526
]
Jingcheng Du commented on HBASE-11339:
--------------------------------------
In the current design, the Lob files are saved by date(for example
tableName/columnfamily/date/lobFileName), it's easy to delete the lob files
which are expired (by the TTL).
The date of commit is used as this date in the path.
1. If using the date of commit in the suggested way, we need to update the
reference KVs after the Lob files are committed(rename the file from the temp
directory to the date directory). If the MemStore flush fails while the Lob
file commits successfully, the date of commit is lost when the WALEdits are
replayed. The Lob data and reference KV in HBase could not be connected.
2. If we don't save lob files by date, all the lob files for a column family
are saved together. Then it's difficult to delete the expired lob files( could
delete them by sweep tool instead).
> HBase LOB
> ---------
>
> Key: HBASE-11339
> URL: https://issues.apache.org/jira/browse/HBASE-11339
> Project: HBase
> Issue Type: New Feature
> Components: regionserver, Scanners
> Reporter: Jingcheng Du
> Assignee: Jingcheng Du
> Attachments: HBase LOB Design.pdf
>
>
> It's quite useful to save the massive binary data like images, documents
> into Apache HBase. Unfortunately directly saving the binary LOB(large object)
> to HBase leads to a worse performance since the frequent split and compaction.
> In this design, the LOB data are stored in an more efficient way, which
> keeps a high write/read performance and guarantees the data consistency in
> Apache HBase.
--
This message was sent by Atlassian JIRA
(v6.2#6252)