On Mon, Dec 17, 2018 at 1:33 PM Warren Kumari <[email protected]> wrote: > > > > On Fri, Dec 14, 2018 at 4:05 PM Christopher Wood > <[email protected]> wrote: >> >> On Dec 14, 2018, 12:29 PM -0800, Daniel Kahn Gillmor >> <[email protected]>, wrote: >> >> On Fri 2018-12-14 11:47:58 -0800, Christopher Wood wrote: >> >> On Dec 14, 2018, 10:47 AM -0800, Wes Hardaker <[email protected]>, wrote: >> >> [And, no, we shouldn't go down the road of "privacy requires you disable >> the cache"] >> >> >> Would you mind elaborating on this comment? As you observe, caches are >> harmful to privacy. Refusal to disable the cache in any (?) >> circumstance therefore seems dismissive of user privacy. Perhaps you >> mean turning it off for every query is not a viable path forward? >> >> >> I hope Wes will answer this question on his own, but i wanted to note >> that privacy is not only harmed by caches. it can also be helped by >> caches. >> >> A query for any name will typically radiate *less* information into the >> world if it's answered from a cache, simply because the resolver in >> question doesn't create additional traffic. >> >> In particular, if the cache is already well-populated, and queries are >> padded appropriately, and the name is relatively likely to be in-cache, >> then the only parties that know what was looked up are the client and >> the resolver itself. No authoritative servers or network observers have >> any additional information to distinguish the query from any other >> cache-resolved query handled by the resolver. >> >> So i don't think caching itself offers a clear benefit or harm for >> privacy. One advantage of a resolver is that it effectively acts as a >> mixing/semi-anonymizing agent on behalf of its users. Assuming that the >> resolver itself is not compromised, it can buffer its users from the >> authoritative servers. >> >> >> Yes, of course, thanks for clarifying the other piece of this puzzle! This >> is indeed a benefit. However, I am not convinced this yields a greater net >> benefit than disabling caching. (I am not aware of any such study or >> analysis on this problem.) That said, all of this depends entirely upon the >> threat model, which can vary greatly. > > > If you disable the cache, and can see that there is an (encrypted) input > query and then immediately an (encrypted) output query to 208.80.154.238 > (ns0.wikimedia.org) you know with very high likelihood what input query was > for.
Agreed, though I think leaking the origin through the address is an issue regardless of whether the cache is shared or not. > If you have a shared cache, there is a much higher likelihood that the input > query gets answered from cache (especially for higher popularity names) and > so there is no output query to correlate with. Techniques which refresh the > cache before the TTL has expired (al la HAMMER) further thwart correlation > attacks. I agree in principle, yet it seems TTL-based stub cache refresh mechanisms could be implemented regardless of whether or not there's a shared resolver cache. (Please correct me if I misunderstood your point!) In my opinion, tradeoffs made between enabling or disabling caching are not well studied. (Thanks to Wes for sharing a pointer to his paper which scratches the surface of this interesting problem.) We need more work before we understand these tradeoffs and choose the "right" answer. Best, Chris _______________________________________________ dns-privacy mailing list [email protected] https://www.ietf.org/mailman/listinfo/dns-privacy
