pespin has posted comments on this change. ( 
https://gerrit.osmocom.org/c/osmo-bsc/+/23234 )

Change subject: stats: add BTS uptime counter
......................................................................


Patch Set 5:

(1 comment)

https://gerrit.osmocom.org/c/osmo-bsc/+/23234/4/src/osmo-bsc/bts.c
File src/osmo-bsc/bts.c:

https://gerrit.osmocom.org/c/osmo-bsc/+/23234/4/src/osmo-bsc/bts.c@586
PS4, Line 586:  int downtime_seconds = BTS_DOWNTIME_SAMPLE_INTERVAL - 
uptime_seconds;
> Let's back up a bit maybe. […]
"Downtime is added to the stat_item. When the statsd system exports these 
values every X seconds, we have between 0 and X seconds of downtime in that 
period. Sum up all these periods and you can see total downtime for each BTS 
during any given timeframe."

Ok so that's the intention of hte new version of the patch. But from what I 
uderstand reading the code, it looks as if the first time it's indeed going to 
set the stat to soemthing between 0 and BTS_DOWNTIME_SAMPLE_INTERVAL, but next 
time this function is called, the item will again be set to 0. So from your 
grafana or whatever you'll only be able to see the first 
BTS_DOWNTIME_SAMPLE_INTERVAL of downtime AFAIU.

Now, a proposal from my side would be:
What about having a 2 stats "become_up" and "become_down" (feel free to rename 
it, I put them like this for you to get the idea), which count the transitions 
down->up and up->down.
That means, at any point of time you can now if the BTS is up, or is down (if 
become_up > become_down, then it's up, otherwise it's down). You can also track 
uptime/downtime periods by checking the timestamp of when the stat changed 
value. You can then see easily in a plot like grafana or using python scripts 
when events happened.

1- BSC starts: become_up=0, become_down=0
... a few seconds pas....
2- BTS connects: become_up=1, become_down=0
... a few hours pass ...
3- BTS disconnects: become_up=1, become_down=1
... instantaneously or even a few hours later ...
4- BTS connects: become_up=2, become_down=1


What do you think? does this proposal fullfill your needs? AFAIU it does 
fullfill "currently there is no way determine a BTS uptime other than by 
polling it via the VTY"
Do I understand correctly that your target is to put those stats in some 
persistent/temporary database to be able to plot and find out what's going on 
over time? then my proposal would work afaiu.
The only not "exact" value would be the exact time at which the event happened, 
where you'd have an error deviation of the number of seconds you configured 
osmo-bsc to push the stats update, which is usually pretty low, in the order of 
seconds? Even then, no events are lost, only the timing is a few seconds 
inaccurate.



--
To view, visit https://gerrit.osmocom.org/c/osmo-bsc/+/23234
To unsubscribe, or for help writing mail filters, visit 
https://gerrit.osmocom.org/settings

Gerrit-Project: osmo-bsc
Gerrit-Branch: master
Gerrit-Change-Id: Ib17674bbe95e828cebff12de9e0b30f06447ef6c
Gerrit-Change-Number: 23234
Gerrit-PatchSet: 5
Gerrit-Owner: iedemam <[email protected]>
Gerrit-Assignee: daniel <[email protected]>
Gerrit-Reviewer: Jenkins Builder
Gerrit-Reviewer: daniel <[email protected]>
Gerrit-Reviewer: laforge <[email protected]>
Gerrit-Reviewer: pespin <[email protected]>
Gerrit-CC: dexter <[email protected]>
Gerrit-Comment-Date: Fri, 30 Apr 2021 17:41:33 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
Comment-In-Reply-To: iedemam <[email protected]>
Comment-In-Reply-To: pespin <[email protected]>
Gerrit-MessageType: comment

Reply via email to