[ 
https://issues.apache.org/jira/browse/HBASE-17471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15836827#comment-15836827
 ] 

stack commented on HBASE-17471:
-------------------------------

[~allan163], what [~carp84] said regards tests passing though there are hanging 
mvcc transactions; they are covering up dodgy behavior. This suggestion of 
yours will help? ' if(we != null && mvcc != null && !inMemstore) {...'). You 
need to get it into next iteration of the patch?

Also agree w/ [~carp84] assessment of the patch and perf numbers. Patch is 
great. Nice cleanup.

On adding region name to IOE, that will be of no value.... Maybe add the walKey 
and current state of the sequence id/mvcc at time of exception... that'd be of 
some use. Otherwise, I see you are just porting the code that was there 
already...

I tried backport to branch-1 but gave up after a few minutes. It is messy.

On the patch:

That is great that the problematic latch is gone (My fault it there in first 
place).

nit: You want to do more cleanup here?

960           FSWALEntry entry = new FSWALEntry(txid, key, edits, hri, 
inMemstore);
961           entry.stampRegionSequenceId(we);

Stamping sequenceid into Cells could be done in the constructor rather than in 
stampRegionSequenceId.

Ideal would be a WALKey that took sequenceid in its constructor. That'd be 
sweet. Can do in follow-up issue.

I ran my little rig running WALPE on a small HDFS cluster with 1-100 threads. 
Here's numbers:


||Threads||w/o patch||with patch||
|1|127.5s/7840ops/s|130.6s/7653ops/s|
|3|135.8s/22099ops/s|137s/21740ops/s|
|5|140.3s/35635ops/s|147s/33961ops/s|
|10|152.5s/65544ops/s|154s/64886ops/s|
|25|194.2s/128718ops/s|179s/139289ops/s|
|50|271.6s/184059ops/s|256s/195024ops/s|
|100|388.9s/257111ops/s|405s/246383ops/s|

So, about the same.

Let me run ITBLL with chaos on this little cluster for a while with the patch 
in place. That'll test some that all is working as expected. I'll be back.

















> Region Seqid will be out of order in WAL if using mvccPreAssign
> ---------------------------------------------------------------
>
>                 Key: HBASE-17471
>                 URL: https://issues.apache.org/jira/browse/HBASE-17471
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 2.0.0, 1.4.0
>            Reporter: Allan Yang
>            Assignee: Allan Yang
>            Priority: Critical
>         Attachments: HBASE-17471-duo.patch, HBASE-17471-duo-v1.patch, 
> HBASE-17471-duo-v2.patch, HBASE-17471.patch, HBASE-17471.tmp, 
> HBASE-17471.v2.patch, HBASE-17471.v3.patch, HBASE-17471.v4.patch, 
> HBASE-17471.v5.patch
>
>
>  mvccPreAssign was brought by HBASE-16698, which truly improved the 
> performance of writing, especially in ASYNC_WAL scenario. But mvccPreAssign 
> was only used in {{doMiniBatchMutate}}, not in Increment/Append path. If 
> Increment/Append and batch put are using against the same region in parallel, 
> then seqid of the same region may not monotonically increasing in the WAL. 
> Since one write path acquires mvcc/seqid before append, and the other 
> acquires in the append/sync consume thread.
> The out of order situation can easily reproduced by a simple UT, which was 
> attached in the attachment. I modified the code to assert on the disorder: 
> {code}
>     if(this.highestSequenceIds.containsKey(encodedRegionName)) {
>       assert highestSequenceIds.get(encodedRegionName) < sequenceid;
>     }
> {code}
> I'd like to say, If we allow disorder in WALs, then this is not a issue. 
> But as far as I know, if {{highestSequenceIds}} is not properly set, some 
> WALs may not archive to oldWALs correctly.
> which I haven't figure out yet is that, will disorder in WAL cause data loss 
> when recovering from disaster? If so, then it is a big problem need to be 
> fixed.
> I have fix this problem in our costom1.1.x branch, my solution is using 
> mvccPreAssign everywhere, making it un-configurable. Since mvccPreAssign it 
> is indeed a better way than assign seqid in the ringbuffer thread while 
> keeping handlers waiting for it.
> If anyone think it is doable, then I will port it to branch-1 and master 
> branch and upload it. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to