[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations

stack (JIRA) Mon, 06 Apr 2015 17:29:32 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482283#comment-14482283
 ]


stack commented on HBASE-13389:
-------------------------------

bq. So with all this I do see any reason to keep these for more than a few 
hours.

Its not log rolling as per Enis. It is when memstore is flushed.  Default is 
memstores are flushed at least once an hour:

 public static final int DEFAULT_CACHE_FLUSH_INTERVAL = 3600000;

So if an old edit comes in during distributed log replay, an edit that has 
already been flushed to an hfile, we need to be able to put it in the 
appropriate slot (as you say). This can happen if we are overplaying edits in 
case where Master does not have last flush sequenceid on a region. If HFiles 
have all their seqids, it is easy.  But if mvcc has been purged from hfiles 
(optimization) and we get an edit that falls into the hfile time range, we are 
going to be confused.  Somehow the optimization purging mvcc should not run 
until we are sure old WALs with seqids older than those in hfiles for all 
regions have been let go.

For replication, yeah, needs a few days.  The root of the lag may take a few 
days to fix.

On the put -> delete -> put, you are not against changing sort order so that 
seqid prevails over type are you [~lhofhansl]? Would be good change for 2.0.

> [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
> -------------------------------------------------------------
>
>                 Key: HBASE-13389
>                 URL: https://issues.apache.org/jira/browse/HBASE-13389
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Performance
>            Reporter: stack
>         Attachments: 13389.txt
>
>
> HBASE-12600 moved the edit sequenceid from tags to instead exploit the 
> mvcc/sequenceid slot in a key. Now Cells near-always have an associated 
> mvcc/sequenceid where previous it was rare or the mvcc was kept up at the 
> file level. This is sort of how it should be many of us would argue but as a 
> side-effect of this change, read-time optimizations that helped speed scans 
> were undone by this change.
> In this issue, lets see if we can get the optimizations back -- or just 
> remove the optimizations altogether.
> The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291.
> The optimizations undone by this changes are (to quote the optimizer himself, 
> Mr [~lhofhansl]):
> {quote}
> Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166.
> We're always storing the mvcc readpoints, and we never compare them against 
> the actual smallestReadpoint, and hence we're always performing all the 
> checks, tests, and comparisons that these jiras removed in addition to 
> actually storing the data - which with up to 8 bytes per Cell is not trivial.
> {quote}
> This is the 'breaking' change: 
> https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations

Reply via email to