[ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14037864#comment-14037864 ]
Jonathan Hsieh commented on HBASE-11339: ---------------------------------------- Thanks for following up with good questions! You haven't called it out directly but your questions are leading towards trouble spots in a loblog design. One has to do with atomicity and the other has to do with reading recent values. I think the latter effectively disqualifies the loblog idea. Here's a writeup. bq. In this way, we save the Lob files as SequenceFiles, and save the offset and file name back into the Put before putting the KV into the MemStore, right? Essentially yes. They aren't necessarily sequence files -- they would be synced to complete writing the lob just like the current hlog files does with edits. bq. 1. If so, we don't use the MemStore to save the Lob data, right? Then how to read the Lob data that are not sync yet(which are still in the writer buffer)? If the loblog write and locator write into the hlog both succeed, we'd use the same design/mechanism you currently have to read lobs that aren't present in the memstore since they were flushed. The difference is that the loblogs are still being written. In HDFS you can read files that are currently being written, however you aren't guaranteed to read to the most recent end of the file since we have no built in tail in hdfs yet). Hm.. so we have a problem getting latest data. So for the lob log design to be correct, it would need work on hdfs to provide guarantees or a tail operation. While not out of the question, that would be a ways out from now and disqualifies the lob log for the short term. bq. 2. We need add a preSync and preAppend to the HLog so that we could sync the Lob files before the HLogs are sync. Explain why you need presync and preappend? I think this is getting at a problem where we are trying to essentially sync writes to two logs atomically. Could we just not issue the locator put until the lob has been synced? (a lob that is just around won't hurt anything, but a bad locator would). Both the lob and the locator would have the same ts/mvcc/seqno. In the PDF's design, this shouldn't be a problem because it would use the normal write path for atomicity guarantees. Currently hbase guarantees atomicity of CF's at flush time, and by having all cf:c's added to the hlog and memstore atomically. bq. In order to get the correct offset, we have to synchronize the prePut in the coprocessor, or we could use different Lob files for each thread? Why not just write+sync the lob and then write the locator put? For lobs we'd use the same mechanism to sync (one loblog for all threads, queued using the disruptor work). > HBase MOB > --------- > > Key: HBASE-11339 > URL: https://issues.apache.org/jira/browse/HBASE-11339 > Project: HBase > Issue Type: New Feature > Components: regionserver, Scanners > Reporter: Jingcheng Du > Assignee: Jingcheng Du > Attachments: HBase LOB Design.pdf > > > It's quite useful to save the medium binary data like images, documents > into Apache HBase. Unfortunately directly saving the binary MOB(medium > object) to HBase leads to a worse performance since the frequent split and > compaction. > In this design, the MOB data are stored in an more efficient way, which > keeps a high write/read performance and guarantees the data consistency in > Apache HBase. -- This message was sent by Atlassian JIRA (v6.2#6252)