I was also confusing ageOfLastShippedEdit with ageOfLastAppliedEdit. I had fixed ageOfLastShippedEdit on the source in 0.94. Looks like 0.94 is doing the right thing with ageOfLastAppliedEdit, but 0.98+ is not.
-- Lars ________________________________ From: lars hofhansl <[email protected]> To: Demai Ni <[email protected]>; "[email protected]" <[email protected]> Sent: Wednesday, July 30, 2014 10:30 AM Subject: Re: meaning for AgeOfLastAppliedOp in Replication MetricsSink > When the 'the time an edit entered the system' doesn't change (in the case of no Sink Op enter for a period of time), the age will keep growing since current time moving forward, which gives a false impression that an edit sitting in the queue for very long time. isn't it? I think until the item is shipped it should be counted as waiting. I.e. the time this reports is the time between an edit entered the system and the time it finally gets shipped to the replication sink.refreshAgeOfLastAppliedOp() should only be called when something is actually being shipped, not periodically (I just fixed that in 0.94 HBASE-11143). But I see you're looking at 0.98. There indeed we are called refreshAgeOfLastAppliedOp every time we call getStats(), which would increase that metric even when there is nothing to ship. That looks like a bug. -- Lars ________________________________ From: Demai Ni <[email protected]> To: "[email protected]" <[email protected]>; lars hofhansl <[email protected]> Sent: Wednesday, July 30, 2014 8:49 AM Subject: Re: meaning for AgeOfLastAppliedOp in Replication MetricsSink Lars, thanks for your input. This metric indicates the time an edit sat in the "replication queue" before it got replicated. yeah, I am with you on this. With that definition it is doing the right thing: Reporting current time - the time an edit entered the system (it's WAL time) > When the 'the time an edit entered the system' doesn't change (in the case of no Sink Op enter for a period of time), the age will keep growing since current time moving forward, which gives a false impression that an edit sitting in the queue for very long time. isn't it? On Tue, Jul 29, 2014 at 10:54 PM, lars hofhansl <[email protected]> wrote: This metric indicates the time an edit sat in the "replication queue" before it got replicated. >With that definition it is doing the right thing: Reporting current time - the >time an edit entered the system (it's WAL time) > > >-- Lars > > > >________________________________ > From: Demai Ni <[email protected]> >To: "[email protected]" <[email protected]> >Sent: Tuesday, July 29, 2014 3:48 PM >Subject: meaning for AgeOfLastAppliedOp in Replication MetricsSink > > > >hi, > >A quick question to clarify this AgeOfLastAppliedOp in MetricsSink.java. I >assume it is used as an indicator about how long for a Sink OP to be >applied; but instead, it is more like to show how long since the last Sink >OP applied > > /** > * Set the age of the last applied operation > * > * @param timestamp The timestamp of the last operation applied. > * @return the age that was set > */ > public long setAgeOfLastAppliedOp(long timestamp) { > lastTimestampForAge = timestamp; > long age = System.currentTimeMillis() - lastTimestampForAge; > rms.setGauge(SINK_AGE_OF_LAST_APPLIED_OP, age); > return age; > } > >In the following scenario: >1) at 7:00am a sink op is applied, and the SINK_AGE_OF_LAST_APPLIED_OP is >set for example 100ms; >2) and then NO new Sink op occur. >3) when a refreshAgeOfLastAppliedOp() is invoked at 8:00am. Instead of >return the 100ms, the AgeOfLastAppliedOp become 1hour + 100ms, which >doesn't make sense, right? > >should we put a check for (lastTimestampForAge != timestamp) before refresh >the age? > >Thanks > >Demai
