[ 
https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14035448#comment-14035448
 ] 

Jingcheng Du commented on HBASE-11339:
--------------------------------------

bq. I'm not convinced. The idea I'm suggesting is having a special lob log file 
that is written once at write time that is essentially the lob store files in 
the doc, and put a reference to it (file name, and offset) in the normal wal. 
This allows the lob to only be written once. I don't see how this would be less 
efficient than an approach that must write the values out at least twice.
  In this way, we save the Lob files as SequenceFiles, and save the offset and 
file name back into the put before putting the KV into the MemStore, right?
 1. If so, we don't use the MemStore to save the Lob data, right? Then how to 
read the Lob data that are not sync yet(which are still in the writer buffer)?
 2. We need add a preSync and preAppend to the HLog so that we could sync the 
Lob files before the HLogs are sync.
 3. In order to the get the correct offset, we have synchronized the prePut in 
the coprocessor, or we could use different Lob files for each thread?

bq. I agree about the hdfs small files problem but I think we need to properly 
define what a LOB is and the scope of this effort. (hence my suggestion of 
Medium Objects – MOBS).
Agree

bq. I'm under the impression we are solving the latter case here. Is that 
correct?
That's right.

> HBase LOB
> ---------
>
>                 Key: HBASE-11339
>                 URL: https://issues.apache.org/jira/browse/HBASE-11339
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver, Scanners
>            Reporter: Jingcheng Du
>            Assignee: Jingcheng Du
>         Attachments: HBase LOB Design.pdf
>
>
>   It's quite useful to save the massive binary data like images, documents 
> into Apache HBase. Unfortunately directly saving the binary LOB(large object) 
> to HBase leads to a worse performance since the frequent split and compaction.
>   In this design, the LOB data are stored in an more efficient way, which 
> keeps a high write/read performance and guarantees the data consistency in 
> Apache HBase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to