[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

stack (JIRA) Wed, 26 Oct 2016 14:10:01 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15609678#comment-15609678
 ]


stack commented on HBASE-16890:
-------------------------------

I think your ringbuffer hack 'proves' that the synchronization is the 
throughput killer; i.e. you did the sloppy test I was talking of doing above. 
Good. 

I suggest that now we take a step back. There are three Qs, a txid accounting 
and a netty eventloop that can't be blocked when running the asyncwal. There is 
also a means of carrying unsync'd appends from one WAL to another if a WAL goes 
unresponsive, a feature we'd like to keep (Another 'feature' of asyncwal that 
we don't have in FSHLog is some size accounting so if N bytes of appends, sync 
regardless....we should keep this too, something current FSHLog can't do 
because it intentionally lets appends and sync run independent of each other). 
[~Apache9] has undone a bunch of moving parts. Lets be parsimonious about what 
we put back. Can you post your 'hack" I'd like to see it.

On txid, its the ringbuffer txid in FSHLog -- so syncs get their own -- but in 
asyncwal, txid is run independently. Any sense of how many syncs you were 
aggregating in your tests? asyncwal seems to do about half of what ringbuffer 
was doing for whatever reason. More aggregating is better but not so much as to 
'hold' tens of handlers from doing anything else (a problem that comes of 
Handler threads doing request start to finish rather than SEDA model -- a 
different issue).




> Analyze the performance of AsyncWAL and fix the same
> ----------------------------------------------------
>
>                 Key: HBASE-16890
>                 URL: https://issues.apache.org/jira/browse/HBASE-16890
>             Project: HBase
>          Issue Type: Sub-task
>          Components: wal
>    Affects Versions: 2.0.0
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 2.0.0
>
>         Attachments: Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 
> 2016-10-25 at 7.39.07 PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, 
> async.svg, classic.svg, contention.png, contention_defaultWAL.png
>
>
> Tests reveal that AsyncWAL under load in single node cluster performs slower 
> than the Default WAL. This task is to analyze and see if we could fix it.
> See some discussions in the tail of JIRA HBASE-15536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same

Reply via email to