Hi,
alarm-list holds a list of raised and cleared alarms (but not those that have
never been raised).
alarm-inventory holds a list of all possible alarm _types_.
If alarm-inventory were to hold a list of all possible alarms, that would
indeed resolve my concern. (It would be equivalent to our all-alarms list)
As an example, suppose a server supports a few kinds of resources - interfaces,
line cards, and timing cards - and the alarm types timing-signal-lost (which
applies to timing cards) and timing-unsynchronized (which applies to TDM
interfaces experiencing slips). An operator who is unaware of the distinction
might configure the management application to send an email if a
(timing-unsynchronized, TimingCard1) alarm is raised. If the server exported a
list of all possible alarms, the management application would be able to
prevent the operator from selecting this combination.
>> * has-clear doesn't need to be a union of only one type
> ?
The type of the leaf "/alarms/alarm-inventory/alarm-type/has-clear" is defined
as "type union {type boolean;}"
This is redundant as it could simply be "type boolean;"
> I think there is a huge value in this module design that as a client you see
> the alarm history per alarm not in a separate log.
> As a user you select the interface of a device and you see the current alarm
> state as well as the history. This is important for trouble-shooting
Agreed.
> But other things than entities can have alarms. AND instance-identifers are
> not “free-form”. It is a strange limitation to limit alarms to the
> entity-model
I agree; however, it has the nice effects that it is possible to enumerate all
resources, and also that the resources and their associated alarms form a
hierarchy which can be visually displayed - an operator can expand a tree to
get progressively more detail on the device status.
The top level of the tree shows "device has problems"; the next level shows
"line card 3 has problems"; the next level shows "Interface 3/2 has problems";
the next level shows "Interface 3/2 has no media".
However, I think the concepts of root cause resources and impacted resources
are more useful than this hierarchy.
> I do not understand when you say "initially populates the alarm list with all
> possible alarms”.
If the link is up on EthernetPort1, then in this model (link-down,
EthernetPort1) is a "possible alarm" and not an "actual alarm", while in our
model it is just an alarm, whose status is "cleared".
I'm not sure about the rest of the networking world, but this is the model
Aviat devices currently use. If Aviat is the exception here, we can certainly
adjust our terminology when implementing this module.
> OK, I can work on improved descriptions. They idea is the following.
> As an operator you would like to have *one* alarm although there are several
> symptoms, rather than having 15 alarms per symptom.
> This module gives you the freedom of selecting should the faulty resource be
> the alarming object or the impacted resource.
> Both options are available.
> If you raise an alarm on an interface you might say that a VPN is “impacted”
> If you raise an alarm on a VPN due to some probing you can hint the operator
> on the corresponding interface who might have a bad config
I like this idea, but I'm not sure how the device would determine which alarms
to raise.
It sounds like an "interface is down" alarm would be raised if an interface
goes down, *unless* that also causes a VPN to go down?
Alex
________________________________________
From: stefan vallin <[email protected]>
Sent: Friday, 7 October 2016 8:32 p.m.
To: Alex Campbell
Cc: Martin Bjorklund; [email protected]
Subject: Re: [netmod] New Version Notification for
draft-vallin-netmod-alarm-module-00.txt
Hi!
Thanks for your comments!
Several of your comments seems to be that you might not have understood the
difference between “alarm-list” and “alarm-inventory”
Please read those a bit before seeing my comments
See comments inline
> On 06 Oct 2016, at 00:22, Alex Campbell <[email protected]> wrote:
>
> Hi,
>
> The main issue I have with this draft is that it's there's no way for the
> operator to get a list of all possible alarms on the device, without
> device-specific semantic knowledge.
> They can get a list of all alarm *types*, but there's no information that
> says, for example, that a link-loss alarm can't be raised for a local CPU
> resource (or indeed, that the local CPU resource even exists).
> This is exacerbated by the ability of operators can delete alarm entries;
> even if a device initially populates the alarm list with all possible alarms,
> the operator can still delete some of them, in which case the list no longer
> reflects all possible alarms.
I do not really get your comment here.
All possible alarms are published via the alarm inventory. Nothing can be
delete in the inventory.
A manager reads this table to understand the possible alarms.
I do not understand when you say "initially populates the alarm list with all
possible alarms”.
1) the read-only “alarm-inventory” publishes all possible alarms, including a
description of the possible alarm
2) the “alarm-list” shows actual alarms, which are all of an alarm-type listed
in the alarm inventory
>
>
> We have an internally developed (but not yet published) YANG model for alarms
> which is very similar to this draft model, but with the following key
> differences:
> * It does not track past status changes (they are stored in a separate event
> log); it is only concerned with current state.
I think there is a huge value in this module design that as a client you see
the alarm history per alarm not in a separate log.
As a user you select the interface of a device and you see the current alarm
state as well as the history. This is important for trouble-shooting
> * Alarms are associated with entities (from ietf-entity.yang) rather than
> arbitrary strings or instance-identifiers.
But other things than entities can have alarms. AND instance-identifers are not
“free-form”. It is a strange limitation to limit alarms to the entity-model
> * It contains a list of all-alarms, regardless of whether they have ever been
> raised. This includes static information (description, severity) as well as
> current state information.
See above, this is the alarm inventory
> * It contains a separate list of raised-alarms, which mostly duplicates the
> information from all-alarms, but only contains an entry for an alarm if it is
> raised. This allows a subtree filter to retrieve only information about
> raised alarms, but it may be redundant.
See above this is the alarm list
> * Instead of shelved alarms, we have a simple boolean "disabled" setting for
> each alarm; raised disabled alarms still appear as raised in the all-alarms
> list (with another indication they're disabled), but do not appear in the
> raised-alarms list.
Can be done with shelfing
>
> With this model (and because ietf-entity.yang defines a hierarchy of
> entities) it is easy to display a hierarchical view of all entities and their
> associated alarms.
>
> In our model, entries in the all-alarms list can only be added when resources
> are added to the system, and can only be deleted when resources are removed
> from the system.
Alarm-inventory shall reflect the possible alarms, I will add your use case to
the description.
>
>
>
> Other comments:
> * is-cleared feels like a double negation (false means "this alarm is not not
> raised"); I would like to see it changed to is-raised
I think again your are confusing alarm-inventory and alarm-list. When the alarm
appears for the first time it is not cleared by the resource
> * I would like to see a YANG feature for past status changes, or perhaps this
> part moved to a separate module augmenting ietf-alarms.
It is the “status-change” list, it shows all status changes
> * has-clear doesn't need to be a union of only one type
?
> * The meanings of "impacted resource" and "root cause resource" are unclear.
OK, I can work on improved descriptions. They idea is the following.
As an operator you would like to have *one* alarm although there are several
symptoms, rather than having 15 alarms per symptom.
This module gives you the freedom of selecting should the faulty resource be
the alarming object or the impacted resource.
Both options are available.
If you raise an alarm on an interface you might say that a VPN is “impacted”
If you raise an alarm on a VPN due to some probing you can hint the operator on
the corresponding interface who might have a bad config
> * "This list is used to shelf alarms" should be "... to shelve alarms"
> * "Shelv alarms for ..." should be "Shelve alarms for ..." (multiple
> occurrences)
Thanks, will fix
br Stefan
>
> Alex
>
> ________________________________________
> From: netmod <[email protected]> on behalf of Martin Bjorklund
> <[email protected]>
> Sent: Thursday, 6 October 2016 1:26 a.m.
> To: [email protected]
> Subject: [netmod] New Version Notification for
> draft-vallin-netmod-alarm-module-00.txt
>
> Hi,
>
> We have posted a new version of the alarm module. The previous
> document was called draft-vallin-alarm-yang-module-00, this new
> version is called draft-vallin-netmod-alarm-module (hence it is also a
> -00).
>
> This updated version incorporates comments on the previous docuement,
> and adds support for alarm shelving.
>
> It would be good to know if people in this WG are interested in this
> work.
>
>
> /martin and stefan
>
>
>
>
> A new version of I-D, draft-vallin-netmod-alarm-module-00.txt
> has been successfully submitted by Martin Bjorklund and posted to the
> IETF repository.
>
> Name: draft-vallin-netmod-alarm-module
> Revision: 00
> Title: YANG Alarm Module
> Document date: 2016-10-05
> Group: Individual Submission
> Pages: 58
> URL:
> https://www.ietf.org/internet-drafts/draft-vallin-netmod-alarm-module-00.txt
> Status:
> https://datatracker.ietf.org/doc/draft-vallin-netmod-alarm-module/
> Htmlized:
> https://tools.ietf.org/html/draft-vallin-netmod-alarm-module-00
>
>
> Abstract:
> This document defines a YANG module for alarm management. It
> includes functions for alarm list management, alarm shelving and
> notifications to inform management systems. There are also RPCs to
> manage the operator state of an alarm and administrative alarm
> procedures. The module carefully maps to relevant alarm standards.
>
> _______________________________________________
> netmod mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/netmod
>
> _______________________________________________
> netmod mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/netmod
_______________________________________________
netmod mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/netmod