[
https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14037015#comment-14037015
]
Jingcheng Du commented on HBASE-11339:
--------------------------------------
[~jmhsieh], and [[email protected]], thanks for the comments.
Think about the suggestion carefully, and have some ideas. Share with all of
you guys, and please kindly provide comments. According to the suggestion, I'll
name the Lob as Mob from now on.
We don't use the MemStore to save the mob data, we directly write the to the
mob file and just for once.
In the prePut of the coprocessor, the KV are split to two KVs, one(KV0) is the
offset+path, the other one(KV1) is the lob KV. KV0 is written to the HLog and
MemStore, and KV1 is written to the mob file.
Before the mob data are async to the disk, they are saved in the buffer of the
mob writer, these data are not seekable until the buffer is full or sync to the
disk.
In order to avoid this, we have to sync the mob data for each put to the disk
(is it ok to sync for the mob in each put? The mob data are usually pictures,
the size is around 1-5MB).
By design, each store has a single mob file for writing. We have to synchronize
the operation to increase the offset of KVs within a single mob file. So we
have to have a synchronization block(two operations in the block, one is the
sync the mob data to disk, the other is to increase the offset) in the prePut
method, consequently all the puts are synchronized here. This is not efficient.
Instead we could improve it here, to use different mob files for each thread.
If so we don't need synchronization, but we will have too many open files in
region server (handler*regionNum). This is a problem.
Also we have a solution for this, we could define a SynchronousQueue with
limited size so that we could have limited open files for each region. All of
these occurs in prePut, and the prePut method should have a synchronization
block in each thread. It's improved, but not efficient IMO.
Before the MemStore flushes(do this in the preFlush of coprocessor), we roll
the mob writers and update the KV offset to 0 for new writers. This will block
the prePut.
Usually by the requirements of customers, using the TTL to clean expired mob
files are very important, it's more efficient to clean the mob files than the
sweep tool(mob files are hardly updated, but have a fixed life time).
We need a way to rename the mob files before the MemStore flushes in the store
flusher, and save these mob files by date.
Such a situation probably happens: The MemStore flushing fails while the mob
files renaming succeeds. When the WALEdits are replayed, the connection between
the edits and mob files are lost. In order to avoid this, we need to add a
rename-transaction znode to zk, each renaming transaction has a znode which
contains several child znodes(they're the mapping from the nameBeforeRename to
nameAfterRename). The txn znode will be deleted after every successful MemStore
flushing and all the txns for each store are exclusive to each other.
How about this?
> HBase MOB
> ---------
>
> Key: HBASE-11339
> URL: https://issues.apache.org/jira/browse/HBASE-11339
> Project: HBase
> Issue Type: New Feature
> Components: regionserver, Scanners
> Reporter: Jingcheng Du
> Assignee: Jingcheng Du
> Attachments: HBase LOB Design.pdf
>
>
> It's quite useful to save the massive binary data like images, documents
> into Apache HBase. Unfortunately directly saving the binary LOB(large object)
> to HBase leads to a worse performance since the frequent split and compaction.
> In this design, the LOB data are stored in an more efficient way, which
> keeps a high write/read performance and guarantees the data consistency in
> Apache HBase.
--
This message was sent by Atlassian JIRA
(v6.2#6252)