[ 
https://issues.apache.org/jira/browse/HBASE-9286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748222#comment-13748222
 ] 

Alex Newman commented on HBASE-9286:
------------------------------------

Just printing out what is sent to the server
metric:Platform.HBase.hbase.posix4e-Satellite-S55-A.ageOfLastShippedOp 2321 
1377059826
metric:Platform.HBase.hbase.posix4e-Satellite-S55-A.ageOfLastShippedOp 4092 
1377059836
metric:Platform.HBase.hbase.posix4e-Satellite-S55-A.ageOfLastShippedOp 1695 
1377059846
metric:Platform.HBase.hbase.posix4e-Satellite-S55-A.ageOfLastShippedOp 7575 
1377059856
metric:Platform.HBase.hbase.posix4e-Satellite-S55-A.ageOfLastShippedOp 17576 
1377059866
metric:Platform.HBase.hbase.posix4e-Satellite-S55-A.ageOfLastShippedOp 27575 
1377059876
metric:Platform.HBase.hbase.posix4e-Satellite-S55-A.ageOfLastShippedOp 37575 
1377059886
metric:Platform.HBase.hbase.posix4e-Satellite-S55-A.ageOfLastShippedOp 2899 
1377059896
metric:Platform.HBase.hbase.posix4e-Satellite-S55-A.ageOfLastShippedOp 2853 
1377059906
metric:Platform.HBase.hbase.posix4e-Satellite-S55-A.ageOfLastShippedOp 4006 
1377059916
metric:Platform.HBase.hbase.posix4e-Satellite-S55-A.ageOfLastShippedOp 429 
1377059926

I suspeded the replication server at around 1377059856 and unsuspended around 
1377059896
                
> ageOfLastShippedOp replication metric doesn't update if the slave 
> regionserver is stalled
> -----------------------------------------------------------------------------------------
>
>                 Key: HBASE-9286
>                 URL: https://issues.apache.org/jira/browse/HBASE-9286
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Alex Newman
>            Assignee: Alex Newman
>         Attachments: 
> 0001-HBASE-9286.-ageOfLastShippedOp-replication-metric-do.patch
>
>
> In replicationmanager
>      HRegionInterface rrs = getRS();
>         rrs.replicateLogEntries(Arrays.copyOf(this.entriesArray, 
> currentNbEntries));
> ....
>         this.metrics.setAgeOfLastShippedOp(
>             this.entriesArray[currentNbEntries-1].getKey().getWriteTime());
>         break;
> which makes sense, but is wrong. The problem is that rrs.replicateLogEntries 
> will block for a very long time if the slave server is suspended or 
> unavailable but not down.
> However this is easy to fix. We just need to call       
> refreshAgeOfLastShippedOp();
> on a regular basis, in a different thread. I've attached a patch which fixed 
> this for cdh4. I can make one for trunk and the like as well if you need me 
> to do but it's a small change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to