I wrote:
>If this is happening with iburst *off*, it becomes more difficult to
>understand how the rate limit is being triggered.  I think maybe we
>should start by focusing on something else: why is hpoll not
>recovering after a KOD?
>
>I'm thinking this sounds like some KOD-recovery logic got lost during
>the refactor.

Trying to trace how things go bad.  Looks to me like this piece of
logic down around line 592, processing a KOD, sets minpoll high:

        if(is_kod(pkt)) {
                if(!memcmp(pkt->refid, "RATE", REFIDLEN)) {
                        peer->selbroken++;
                        report_event(PEVNT_RATE, peer, NULL);
                        if (peer->minpoll < 10) { peer->minpoll = 10; }
                        peer->burst = peer->retry = 0;
                        peer->throttle = (NTP_SHIFT + 1) * (1 << peer->minpoll);
                        poll_update(peer, 10);
                }
                return;
        }

Then poll_update sets hpoll to 10.  Achim seems to be reporting that
it stays stuck there.  Now I look at this:

void
poll_update(
        struct peer *peer,      /* peer structure pointer */
        uint8_t mpoll
        )
{
        unsigned long   next, utemp;
        uint8_t hpoll;

        /*
         * This routine figures out when the next poll should be sent.
         * That turns out to be wickedly complicated. One problem is
         * that sometimes the time for the next poll is in the past when
         * the poll interval is reduced. We watch out for races here
         * between the receive process and the poll process.
         *
         * Clamp the poll interval between minpoll and maxpoll.
         */
        hpoll = max(min(peer->maxpoll, mpoll), peer->minpoll);

        peer->hpoll = hpoll;

This means that hpoll can never be set lower than minpoll. Which means
there will never be any recovery from the KOD rate limit, no matter
what values poll_update() is called with, unless minpoll is lowered.

But this never happens.

ntp_peer.c:721: peer->minpoll = min(minpoll, NTP_MAXPOLL);
ntp_peer.c:724:         peer->minpoll = peer->maxpoll;
ntp_proto.c:596:                        if (peer->minpoll < 10) { peer->minpoll 
= 10; }
refclock_jjy.c:2788:            peer->minpoll = 8 ;
refclock_oncore.c:621:  peer->minpoll = 4;
refclock_trimble.c:469: peer->minpoll = TRMB_MINPOLL;

The ntp_peer.c hits are during new-peer initialization. The refclock hits
are irrelevant, we're troubleshooting the code path for NTP peers.  My
deduction is that ntp_proto.c:596 is probably wrong, it's disabling
the normal poll interval hysteresis (which I admit I only vaguely
understand).

But the problem may be deeper than that.  The corresponding code in
Classic is this:

        /*
         * Check to see if this is a RATE Kiss Code
         * Currently this kiss code will accept whatever poll
         * rate that the server sends
         */
        peer->ppoll = max(peer->minpoll, pkt->ppoll);
        if (kissCode == RATEKISS) {
                peer->selbroken++;      /* Increment the KoD count */
                report_event(PEVNT_RATE, peer, NULL);
                if (pkt->ppoll > peer->minpoll)
                        peer->minpoll = peer->ppoll;
                peer->burst = peer->retry = 0;
                peer->throttle = (NTP_SHIFT + 1) * (1 << peer->minpoll);
                poll_update(peer, pkt->ppoll);
                return;                         /* kiss-o'-death */
        }

I see that our line 596 is a replacement for allowing the KOD packet
to set the poll rate.  That makes all kinds of sense, as a spoofed KOD
packet with a maliciously high poll interval is an obvious DoS
vector. (See, Daniel? I are learning to think like an InfoSec
paranoid.)

Unfortunately for this neat theory, the correwsponding grep hits in
Classic are:

ntp_peer.c:857:         peer->minpoll = NTP_MINDPOLL;
ntp_peer.c:859:         peer->minpoll = min(minpoll, NTP_MAXPOLL);
ntp_peer.c:865:         peer->minpoll = peer->maxpoll;
ntp_proto.c:1589:                       peer->minpoll = peer->ppoll;

Again, the ntp_peer.c hits are during newpeer initialization.  That
is, I can't find any way that minpoll recovers after a KOD in
Classic, either.

What am I misssing here?
-- 
                <a href="http://www.catb.org/~esr/";>Eric S. Raymond</a>

Rifles, muskets, long-bows and hand-grenades are inherently democratic
weapons.  A complex weapon makes the strong stronger, while a simple
weapon -- so long as there is no answer to it -- gives claws to the
weak.
        -- George Orwell, "You and the Atom Bomb", 1945
_______________________________________________
devel mailing list
[email protected]
http://lists.ntpsec.org/mailman/listinfo/devel

Reply via email to