[ 
https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14055601#comment-14055601
 ] 

Jonathan Hsieh commented on HBASE-11339:
----------------------------------------

Jingcheng and some of his colleagues chatted with me last week. Here's a quick 
summary and some follow up questions from the conversation.

The proposed design essentially adds a special table wide column 
family/directory where all blobs are written to.  
* This avoids having to rewrite lob data on splits (the problem the cf approach 
suffers from).  
* Blobs are written to the WAL and the memstore.  Flushes write out a reference 
in the normal cf dir and the one blob hfile per region into the shared blob 
dir.   The normal cf write which contains a pointer to the blob hfile/offset 
while the blob write contains the blob data.  This is the simplest way to 
preserve atomicity by avoiding read/write race conditions that Could be present 
if blobs read directly froma "blob log" approach.
* There is a special sweep tool that uses zk and is used garbage collect 
deleted or overwritten blobs based upon a garbage threshold.  

Follow up questions and tasks from after reviewing the design:
1) Please write user level documentation on how an operator or application 
developer would enable and use blobs.  This would be folded into the ref guide 
and is more useful for most folks that the current approach of focusing on the 
individual mechanisms.  For example, does one specify that a cf is a blob?  a 
particular column? a particular cell? A helpful approach would be to write up 
the life cycle of a single blob.
2) Instead of using "special" column/ column family names to denote a 
reference, use the new 0.98 tags feature to tag if a cell is a reference to a 
value in the blob dir.
3) Better explain the life cycle of a blob that has a user specified historical 
timestamp.  where is this written? (into the date dir of the time stamp or of 
the actual write) how is this deleted?  How does the sweep tool interact with 
this?
4) Better explain what if any caching happens when we read values from blob 
hfiles.
5) Provide Integration tests that others can use to verify the correctness and 
robustness of the implementation.

A new question that came up when thinking about the design:
1) How do snapshots work with relation to the current design.  Are the HFiles 
in the Blob dir archived?  Are they needed files tracked when a snapshot is 
taken?  If this is not handled, is there a plan on how to handle it?  

> HBase MOB
> ---------
>
>                 Key: HBASE-11339
>                 URL: https://issues.apache.org/jira/browse/HBASE-11339
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver, Scanners
>            Reporter: Jingcheng Du
>            Assignee: Jingcheng Du
>         Attachments: HBase LOB Design.pdf
>
>
>   It's quite useful to save the medium binary data like images, documents 
> into Apache HBase. Unfortunately directly saving the binary MOB(medium 
> object) to HBase leads to a worse performance since the frequent split and 
> compaction.
>   In this design, the MOB data are stored in an more efficient way, which 
> keeps a high write/read performance and guarantees the data consistency in 
> Apache HBase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to