Hi Marek,


If you would prefer to do it in the mailinglist, that is fine to me as I only 
share my own thoughts about DNS privacy. Again I am no more convinced to 
consider DNS privacy because of the reasons I explained before in mailing list 
and also here explaining with more example and in different categories.

First of all, please do not think your approach is bad as it already used for 
several purposes but in my opinion, it does not work for DNS. It is because, in 
general, there is nothing meaningful are carried by DNS (please see below)



I guess before we go through any solution space, it is required to understand 
the problem exactly. In other words, answer to this question that what we want 
to hide from others and who are our target group that we want to hide this data 
from. Then we can think about whether your solution would help this or not.



First, let's have a different categorization to clarify what we want to hide, 
and who we want to hide this data from. It actually helps to identify the exact 
problem and then think about solution space.



In my opinion there are 3 groups of people ( Target User or for short “TU”) who 
might be interested in our data

1-    a powerful surveillance actor:

This person has power to define a new rule to a country, and all internet 
traffic flows passed by his devices or the companies who MUST report the 
information to him. Otherwise those companies will be banned!

2-    A passive surveillance actor:

This can be a government, a colleague of yours or whoever that do not want to 
others know about his activities. In other words, he doesn’t want to be 
recognized otherwise he would be in trouble

3-    An attacker:

This person wants to obtain data in whatever way possible. If it is necessary, 
he would also do active attacks. He would try at the same time to keep himself 
anonymous but he would do risks to access user’s data.



Now let’s see what information one might obtains from a DNS server, either 
recursive or TLDs.



1-    Domain names, labels, IP addresses on DNS servers

This data might not be meaningful otherwise it point TU to certain traffics 
that carry confidential data. It might also lead to a case where TU can find 
behavior patterns. Of course, gathering this data from a DNS server needs extra 
effort and needs extra time because he needs to crawl each website to know the 
content. Also he doesn’t know what a user searched on this website. So he can 
obtain no precise information.

2-    Source (end user’ IP address) or sender of the data (first point of 
contact where end user’s device connected to such as a recursive resolver)

Since most users are behind NAT, this might not give much information about an 
individual but it might be only a general IP address of a network. In case, the 
target are  messages exchanged between recursive to authoritative name servers, 
it only expose the source IP address of this recursive (I guess this is where 
you have concern about)



Now what the purposes of TU might be to analyze the traffic or try to access 
user’s data (DATA gathering process)

1-    To find users’ behavior and analyze users (this is of course too general. 
So, TU interested in all traffic to find bindings and matching among traffic 
flows)

TU group 2 might be interested in.

2-    To gather information about a particular individual and use this data 
likely against him.

This is what TU group 1,2 and 3 might be interested in.

3-    To gather information about a particular individual only for curiosity 
but not for any malicious activities.

TU group 2 might be interested in



Now what our purposes might be by hiding the DNS traffic. Our main means of DNS 
query protection is…: (Our purpose)

1-    To hide who is the data originator (end user) and hide its identity (such 
as its IP address) behind several other nodes (like what is done in Tor)

Since the originator IP address is not bound to any user, this really not helps.

2-    To disallow user’s behavior analysis and comparison of one user’s 
behavior with other user’s behavior to find a pattern.

This is of course not possible only by focusing on any DNS traffic, either from 
stub to recursive or from recursive to authoritative. Why? Because…

-     The source of this data is not an exact user (it is either a public 
recursive server or a general NAT IP address of a network

-     Domain name, as explained before, doesn’t give any important information 
about users, its behavior, etc. and you need to do much more efforts to crawl 
each domain to find out what content the user was looking for. This is, of 
course, only limited to web sites where you can crawl but not the servers that 
a user connected only for the means of uploading some files, etc (so, this data 
is not precise to use it for users’ behavior analysis). You only can gather 
this information by analyzing and having access to next traffic flows. In this 
case, TU only focuses on next traffic flows and would not bother himself with 
DNS traffic analysis.



3-    To disallow anyone to see what I browse or search in my enterprise, 
public wifi or any other private networks



TU group 1: The content of data is important for him. He doesn’t care what the 
DNS server have is. He cares about the identity of users and wants to know who 
browse what. In this case, a user is forced to use his devices and all traffic 
passed by his devices, either user like this or not but user has no power to 
prevent him having access to his (user) data. So, he can easily change his role 
to active attacker in case user uses any encryption to hide the traffic. He 
already knows the real identity of the user by analyzing next flows and can 
gather as much information about this user as possible.



TU group2: for this group also the content of data is important. They might not 
focus on a particular user because the flow matching is what they want to do to 
find patterns, etc. When the content of data is important, what is happening on 
DNS server is no interest of him because as I explained earlier, DNS traffic is 
not meaningful and needs much more effort to make it meaningful and find that 
bindings between a certain DNS flow to other data flow exchanged between an end 
user and a service. So, in this case, also randomizing recursive DNS servers 
and hiding one recursive under the other one is only complicates the query and 
might need extra time and effort but without having any significant benefits.



TU group3: they are also interested in content of data. They also do risks to 
be active attacker in order to have access to this data. So, DNS information is 
no interest of him. He might also not have enough power to sniff all traffic 
but he would also focus on what he can sniff. He might go to a public wifi 
network and either sniffs the traffic and if not possible, plays a role of 
active attacker to obtain data that he looks for.





Now there is one question:

1-    If we hide the identity of data originator that can be an end user or a 
recursive resolver, does it help to protect users or the data exposes to a 
certain DNS server?

-     For TU- group 1: No, if it falls into category of data gathering process 
because an IP address does not bound to a particular user when using NAT or 
proxy server. But the content of this data exposes the users data

-     For TU- group 2: No, the same reason for TU group 1

-     For TU – group 3: No, again the same reason for TU group 1.

The other reason is that, although that identity is for a particular node 
(when, for instance, it uses IPv6) but the content of flow (data) would expose 
information about user. This is of course not the flow that goes to DNS server 
but it is flows that user start it after this query from DNS server. Now the 
question is, could you hide data for any means of randomization?

Actually no, because the problem is that the next flows that are the flow from 
end users to services are important and expose information about user, its 
behavior, etc.



So, in general, I do not think your approach works here since there is no clear 
benefit on it. There is only more complexity or might be performance issue 
because we are going to use several intermediate nodes to answer a query asked 
by a user or a recursive server.

In other words, DNS traffic does not carry anything interesting information for 
TU 1,2,3 and only the content of data exchanged after DNS query is important. 
Usually, when TU is located where it can sniff the DNS flows, it can also sniff 
next flows that are of his interest(like your example of ISPs). In this case, 
he skips all DNS flows. In other words, by only analyzing a sample TLD’s 
traffic or a sample resolver’s traffic, TU cannot gather any meaningful 
information because…

1-   not all those queries are because a user wants to connect to a website but 
it can be because this user wants to connect to his file server that has a 
domain name, thereby he needs to do much more efforts to find out what was the 
reason for this query to TLD or so. So this is what not of TU’s interest when 
he has other easier way to gather this information

2-  A website might have several different information and only user is 
interested in one of that so no precise information can TU gather about a user 
by he himself crawl a website.

3-  This IP is a general IP address and doesn’t bound to any user

4-  DNS domain names are only no meaningful names and he needs to crawl to find 
the data or he can easily skip all DNS traffics and only focus on data flows 
passed by different servers.



I hope I could explain my point clearly here.

I am sorry for a long message….

Best,

Hosnieh



>What I wrote about the onion routing has nothing to with

> protecting the query content, but rather hiding the fact "who's asking".

> Maybe I didn't articulate it clearly enough. Suppose there are three

> traffic observers: the authoritative name server, forwarders, and bad

> guy between the resolver and the auth.

>

> What happens when user asks a query? All know who's asked, or from

> which network the query originated (because the resolver serves that

> subnet).

> What happens when a query is onion routed first? Bad guy may see only

> part of the traffic, that he is able to sniff (parts of the query may

> have different exit points). Authoritative name servers see the "exit

> point"

> resolver, but can't trace the query back to the first. Forwarders see

> only previous resolver.

>

> So what have we gained? Neither auth/resolver/bad guy know _who_ asked

> first, unless the query originator reveals himself with edns client

> subnet or something voluntarily.

> Is that worth of the extra cost of bouncing queries? It might be, I

> don't know.

>

> So, to the "DNS queries carry no important information when the

> surveillance actor have access to the whole traffic flow". That might

> be true, but it's only a one scenario (ISP sniffing on its users).

> For example a root server doesn't see any of your other traffic, but it

> may still know, that you look up an embarrassing website every Monday

> afternoon.

> What I'm most worried about is TLDs, root servers and resolver

> operators collecting traffic. This might be used for a good cause

> (passive DNS, law enforcement), but also for a bad cause.

>

> We could continue on-list, if you're okay with it, because other people

> might be interested.

>

> Best,

> Marek

>

> On 27 August 2014 10:52, Hosnieh Rafiee 
> <[email protected]<mailto:[email protected]>>

> wrote:

> > Offlist

> >

> >

> > Dear Merek,

> >

> > The problem is not how to protect DNS queries. But the problem is

> that DNS queries carry no important information when the surveillance

> actor have access to the whole traffic flow. So, he does not need to

> sniff any DNS traffic at all. So why do we need to complicate our own

> process and our own computation when we do not benefit from it?

> > I hope you understand my point.

> > Best,

> > Hosnieh
_______________________________________________
dns-privacy mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/dns-privacy

Reply via email to