iedemam has posted comments on this change. ( 
https://gerrit.osmocom.org/c/osmo-bsc/+/23234 )

Change subject: stats: add BTS uptime counter
......................................................................


Patch Set 5:

(1 comment)

Hi,

Thanks again for taking a look. Reply to your comment is maybe long but 
hopefully clear.

-Michael

https://gerrit.osmocom.org/c/osmo-bsc/+/23234/4/src/osmo-bsc/bts.c
File src/osmo-bsc/bts.c:

https://gerrit.osmocom.org/c/osmo-bsc/+/23234/4/src/osmo-bsc/bts.c@586 
PS4, Line 586:  int downtime_seconds = BTS_DOWNTIME_SAMPLE_INTERVAL - 
uptime_seconds;
> I'm still not getting it, I'm sorry. Or maybe I'm getting it but I still find 
> it really strange. […]
Let's back up a bit maybe. Currently there is no way determine a BTS uptime 
other than by polling it via the VTY. If the BSC restarts for some reason, all 
uptime tracking is lost. I wanted to have the uptime available via the statsd 
interface so each BTS uptime during any given period can be known without 
risking losing state in a restart.

So, originally I wrote this to run every second and count uptime. Are we up? 
Good, increment the uptime counter. This value is summed every X seconds when 
statsd runs and the value is exported. Every interval of statsd would contain 
between 0 and X seconds of uptime. I can sum these intervals, for example, for 
an hour and the difference between that number and 3600 will be my downtime. 
Straightforward I thought.

This approach was rejected. It ran too often, abused the counter interface, and 
counted uptime instead of downtime. OK, so now I've changed to using a 
stat_item, only running every INTERVAL seconds and counting downtime.

The BTS_DOWNTIME_SAMPLE_INTERVAL value now represents the maximum amount of 
downtime that we would be willing to let go missing if a restart would occur 
because it isn't getting pushed into the statsd system. When we execute the 
periodic timer to calculate downtime we see how many seconds of uptime have 
elapsed and take the difference from the interval to determine downtime.

Downtime is added to the stat_item. When the statsd system exports these values 
every X seconds, we have between 0 and X seconds of downtime in that period. 
Sum up all these periods and you can see total downtime for each BTS during any 
given timeframe.

Suggestions welcome. I've tried my best to address concerns but am running out 
of ideas.



--
To view, visit https://gerrit.osmocom.org/c/osmo-bsc/+/23234
To unsubscribe, or for help writing mail filters, visit 
https://gerrit.osmocom.org/settings

Gerrit-Project: osmo-bsc
Gerrit-Branch: master
Gerrit-Change-Id: Ib17674bbe95e828cebff12de9e0b30f06447ef6c
Gerrit-Change-Number: 23234
Gerrit-PatchSet: 5
Gerrit-Owner: iedemam <[email protected]>
Gerrit-Assignee: daniel <[email protected]>
Gerrit-Reviewer: Jenkins Builder
Gerrit-Reviewer: daniel <[email protected]>
Gerrit-Reviewer: laforge <[email protected]>
Gerrit-Reviewer: pespin <[email protected]>
Gerrit-CC: dexter <[email protected]>
Gerrit-Comment-Date: Fri, 30 Apr 2021 14:43:17 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
Comment-In-Reply-To: iedemam <[email protected]>
Comment-In-Reply-To: pespin <[email protected]>
Gerrit-MessageType: comment

Reply via email to