[
https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14055601#comment-14055601
]
Jonathan Hsieh commented on HBASE-11339:
----------------------------------------
Jingcheng and some of his colleagues chatted with me last week. Here's a quick
summary and some follow up questions from the conversation.
The proposed design essentially adds a special table wide column
family/directory where all blobs are written to.
* This avoids having to rewrite lob data on splits (the problem the cf approach
suffers from).
* Blobs are written to the WAL and the memstore. Flushes write out a reference
in the normal cf dir and the one blob hfile per region into the shared blob
dir. The normal cf write which contains a pointer to the blob hfile/offset
while the blob write contains the blob data. This is the simplest way to
preserve atomicity by avoiding read/write race conditions that Could be present
if blobs read directly froma "blob log" approach.
* There is a special sweep tool that uses zk and is used garbage collect
deleted or overwritten blobs based upon a garbage threshold.
Follow up questions and tasks from after reviewing the design:
1) Please write user level documentation on how an operator or application
developer would enable and use blobs. This would be folded into the ref guide
and is more useful for most folks that the current approach of focusing on the
individual mechanisms. For example, does one specify that a cf is a blob? a
particular column? a particular cell? A helpful approach would be to write up
the life cycle of a single blob.
2) Instead of using "special" column/ column family names to denote a
reference, use the new 0.98 tags feature to tag if a cell is a reference to a
value in the blob dir.
3) Better explain the life cycle of a blob that has a user specified historical
timestamp. where is this written? (into the date dir of the time stamp or of
the actual write) how is this deleted? How does the sweep tool interact with
this?
4) Better explain what if any caching happens when we read values from blob
hfiles.
5) Provide Integration tests that others can use to verify the correctness and
robustness of the implementation.
A new question that came up when thinking about the design:
1) How do snapshots work with relation to the current design. Are the HFiles
in the Blob dir archived? Are they needed files tracked when a snapshot is
taken? If this is not handled, is there a plan on how to handle it?
> HBase MOB
> ---------
>
> Key: HBASE-11339
> URL: https://issues.apache.org/jira/browse/HBASE-11339
> Project: HBase
> Issue Type: New Feature
> Components: regionserver, Scanners
> Reporter: Jingcheng Du
> Assignee: Jingcheng Du
> Attachments: HBase LOB Design.pdf
>
>
> It's quite useful to save the medium binary data like images, documents
> into Apache HBase. Unfortunately directly saving the binary MOB(medium
> object) to HBase leads to a worse performance since the frequent split and
> compaction.
> In this design, the MOB data are stored in an more efficient way, which
> keeps a high write/read performance and guarantees the data consistency in
> Apache HBase.
--
This message was sent by Atlassian JIRA
(v6.2#6252)