[
https://issues.apache.org/jira/browse/HBASE-16820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15743390#comment-15743390
]
Enis Soztutar commented on HBASE-16820:
---------------------------------------
Hey Nick, sorry the problem has been fixed already via an addendum in
HBASE-16721 for 1.1. See my comment on
https://issues.apache.org/jira/browse/HBASE-16721?focusedCommentId=15570346&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15570346.
This issue is for a longer term fix, should not be a blocker for the release.
> BulkLoad mvcc visibility only works accidentally
> -------------------------------------------------
>
> Key: HBASE-16820
> URL: https://issues.apache.org/jira/browse/HBASE-16820
> Project: HBase
> Issue Type: Bug
> Affects Versions: 1.1.8
> Reporter: Enis Soztutar
> Assignee: Enis Soztutar
> Priority: Blocker
> Fix For: 1.1.8
>
> Attachments: HBASE-16820-branch-1.1-v0.patch
>
>
> [~sergey.soldatov] has been debugging an issue with a 1.1 code base where the
> commit for HBASE-16721 broke the bulk load visibility. After bulk load, the
> bulk load files is not visible because the sequence id assigned to the bulk
> load is not advanced in mvcc.
> Debugging further, we have noticed that bulk load behavior is wrong, but it
> works "accidentally" in all code bases (but broken in 1.1 after HBASE-16721).
> Let me explain:
> - BL request can optionally request a flush before hand (this should be the
> default) which causes the flush to happen with some sequenceId. The flush
> sequence id is one past all the cells' sequenceids. This flush sequence id is
> returned as a result to the flush operation.
> - BL then uses this particular sequenceId to mark the files, but itself does
> not get a new sequenceid of its own, or advance the mvcc number.
> - BL completes WITHOUT making sure that the sequence id is visible.
> - BL itself though writes entries to the WAL for the BL event, which in 1.2
> code bases goes through the whole mvcc + seqId paths, which makes sure that
> earlier sequenceIds (the flush sequenceId) are visible via mvcc.
> The problem with 1.1 is that the WAL entries only get sequence ids, but do
> not touch mvcc. With the patch for HBASE-16721, we have made it so that the
> flushedSequenceId is not used in mvcc as the highest read point (although all
> the data is still visible).
> BL relying on the flush sequence id is wrong for two reasons:
> - BL files are loaded with the flush sequence id from the memstore. This
> particular sequence id is used twice for two different things and ends up
> being the sequence id for flushed file as well as BL'ed files.
> - BL should make sure that it gets a new sequence id and that sequence id is
> visible before returning the results.
> [~ndimiduk] FYI.
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)