Thanks Mark for the clarification.

I just hate adding new knobs and exceptions in a scramble mode. If the knob is 
already there then it’s already there.

There’s already too many knobs in DNS and we all know that.

--
Ondřej Surý <[email protected]> (He/Him)

> On 19. 7. 2023, at 0:43, Mark Andrews <[email protected]> wrote:
> 
> Except BIND does exactly this.  It retries and if all the servers for the 
> zone fail the <name,type> is flagged as bad for 10 minutes and any validation 
> that depends on that lookup fails with DNS_R_BROKENCHAIN which results in 
> SERVFAIL rather than a retry.  This was how we dealt with the so called 
> “rollover and die” issue.
> 
>                } else if (result == DNS_R_BROKENCHAIN) {
>                        isc_result_t tresult;
>                        isc_time_t expire;
>                        isc_interval_t i;
> 
>                        isc_interval_set(&i, DNS_RESOLVER_BADCACHETTL(fctx), 
> 0);
>                        tresult = isc_time_nowplusinterval(&expire, &i);
>                        if (negative &&
>                            (fctx->type == dns_rdatatype_dnskey ||
>                             fctx->type == dns_rdatatype_ds) &&
>                            tresult == ISC_R_SUCCESS)
>                        {
>                                dns_resolver_addbadcache(res, fctx->name,
>                                                         fctx->type, &expire);
>                        }
>                        done = true;
>                        goto cleanup_fetchctx;
>                } else {
>                        fctx_try(fctx, true, true);
>                        goto cleanup_fetchctx;
>                }
> 
> The world doesn’t fall over with limited retries.  We had zero reports 
> resolution failures due to this incident.  This also allows a validator 
> behind a validator to work reliably by having the validator that talks 
> directly to the authoritative servers filter out the garbage responses.  
> Always send CD=1 is STUPID.
> 
>> On 19 Jul 2023, at 04:54, Ondřej Surý <[email protected]> wrote:
>> 
>> With my implementor’s hat on, I think this is wrong approach. It (again) 
>> adds a complexity to the resolvers and yet again based (mostly) on isolated 
>> incident. I really don’t want yet another “serve-stale” in the resolvers. I 
>> have to yet see an evidence that serve-stale has helped anything since the 
>> original incident, but now every resolver has to have it because people want 
>> it.
>> 
>> And operationally, it will just pamper over the issue which might then go 
>> unnoticed for longer period of time rather than being fixed right away.
>> 
>> Ondrej
>> --
>> Ondřej Surý <[email protected]> (He/Him)
>> 
>>>> On 18. 7. 2023, at 20:38, Gavin McCullagh <[email protected]> wrote:
>>> 
>>> I'd like to reach out to NLNet about changing Unbound to do this, so I want 
>>> to make sure people have a chance to disagree.  Feel free to voice your 
>>> disagreement (and reasons) here if you do.
>> 
>> 
>> _______________________________________________
>> dns-operations mailing list
>> [email protected]
>> https://lists.dns-oarc.net/mailman/listinfo/dns-operations
> 
> --
> Mark Andrews, ISC
> 1 Seymour St., Dundas Valley, NSW 2117, Australia
> PHONE: +61 2 9871 4742              INTERNET: [email protected]
> 


_______________________________________________
dns-operations mailing list
[email protected]
https://lists.dns-oarc.net/mailman/listinfo/dns-operations

Reply via email to