Hi Hosnieh, it's a little long winded. So, you're questioning the point of DNS privacy on the basis that it doesn't bring anything useful to the interested observer? Sure there's NAT, MITM can eavesdrop your other traffic and so on. But it is very real for TU2/3 to gather data on the auth/resolvers and use them as a forensic evidence, correlating them to other traffic, geolocation and traffic distribution, whois records etc. with a certain degree of confidence. People have started businesses with this sort of information.
Saying DNS traffic does not carry anything interesting is a bold claim, it might be true, it might be not. Regardless of how it is, I think that giving less information is always better, if the cost isn't too high, and I'm happy to discuss technical questions / ideas. Best, Marek On 27 August 2014 13:54, Hosnieh Rafiee <[email protected]> wrote: > Hi Marek, > > > > If you would prefer to do it in the mailinglist, that is fine to me as I > only share my own thoughts about DNS privacy. Again I am no more convinced > to consider DNS privacy because of the reasons I explained before in mailing > list and also here explaining with more example and in different categories. > > First of all, please do not think your approach is bad as it already used > for several purposes but in my opinion, it does not work for DNS. It is > because, in general, there is nothing meaningful are carried by DNS (please > see below) > > > > I guess before we go through any solution space, it is required to > understand the problem exactly. In other words, answer to this question that > what we want to hide from others and who are our target group that we want > to hide this data from. Then we can think about whether your solution would > help this or not. > > > > First, let's have a different categorization to clarify what we want to > hide, and who we want to hide this data from. It actually helps to identify > the exact problem and then think about solution space. > > > > In my opinion there are 3 groups of people ( Target User or for short “TU”) > who might be interested in our data > > 1- a powerful surveillance actor: > > This person has power to define a new rule to a country, and all internet > traffic flows passed by his devices or the companies who MUST report the > information to him. Otherwise those companies will be banned! > > 2- A passive surveillance actor: > > This can be a government, a colleague of yours or whoever that do not want > to others know about his activities. In other words, he doesn’t want to be > recognized otherwise he would be in trouble > > 3- An attacker: > > This person wants to obtain data in whatever way possible. If it is > necessary, he would also do active attacks. He would try at the same time to > keep himself anonymous but he would do risks to access user’s data. > > > > Now let’s see what information one might obtains from a DNS server, either > recursive or TLDs. > > > > 1- Domain names, labels, IP addresses on DNS servers > > This data might not be meaningful otherwise it point TU to certain traffics > that carry confidential data. It might also lead to a case where TU can find > behavior patterns. Of course, gathering this data from a DNS server needs > extra effort and needs extra time because he needs to crawl each website to > know the content. Also he doesn’t know what a user searched on this website. > So he can obtain no precise information. > > 2- Source (end user’ IP address) or sender of the data (first point of > contact where end user’s device connected to such as a recursive resolver) > > Since most users are behind NAT, this might not give much information about > an individual but it might be only a general IP address of a network. In > case, the target are messages exchanged between recursive to authoritative > name servers, it only expose the source IP address of this recursive (I > guess this is where you have concern about) > > > > Now what the purposes of TU might be to analyze the traffic or try to access > user’s data (DATA gathering process) > > 1- To find users’ behavior and analyze users (this is of course too > general. So, TU interested in all traffic to find bindings and matching > among traffic flows) > > TU group 2 might be interested in. > > 2- To gather information about a particular individual and use this data > likely against him. > > This is what TU group 1,2 and 3 might be interested in. > > 3- To gather information about a particular individual only for curiosity > but not for any malicious activities. > > TU group 2 might be interested in > > > > Now what our purposes might be by hiding the DNS traffic. Our main means of > DNS query protection is…: (Our purpose) > > 1- To hide who is the data originator (end user) and hide its identity > (such as its IP address) behind several other nodes (like what is done in > Tor) > > Since the originator IP address is not bound to any user, this really not > helps. > > 2- To disallow user’s behavior analysis and comparison of one user’s > behavior with other user’s behavior to find a pattern. > > This is of course not possible only by focusing on any DNS traffic, either > from stub to recursive or from recursive to authoritative. Why? Because… > > - The source of this data is not an exact user (it is either a public > recursive server or a general NAT IP address of a network > > - Domain name, as explained before, doesn’t give any important > information about users, its behavior, etc. and you need to do much more > efforts to crawl each domain to find out what content the user was looking > for. This is, of course, only limited to web sites where you can crawl but > not the servers that a user connected only for the means of uploading some > files, etc (so, this data is not precise to use it for users’ behavior > analysis). You only can gather this information by analyzing and having > access to next traffic flows. In this case, TU only focuses on next traffic > flows and would not bother himself with DNS traffic analysis. > > > > 3- To disallow anyone to see what I browse or search in my enterprise, > public wifi or any other private networks > > > > TU group 1: The content of data is important for him. He doesn’t care what > the DNS server have is. He cares about the identity of users and wants to > know who browse what. In this case, a user is forced to use his devices and > all traffic passed by his devices, either user like this or not but user has > no power to prevent him having access to his (user) data. So, he can easily > change his role to active attacker in case user uses any encryption to hide > the traffic. He already knows the real identity of the user by analyzing > next flows and can gather as much information about this user as possible. > > > > TU group2: for this group also the content of data is important. They might > not focus on a particular user because the flow matching is what they want > to do to find patterns, etc. When the content of data is important, what is > happening on DNS server is no interest of him because as I explained > earlier, DNS traffic is not meaningful and needs much more effort to make it > meaningful and find that bindings between a certain DNS flow to other data > flow exchanged between an end user and a service. So, in this case, also > randomizing recursive DNS servers and hiding one recursive under the other > one is only complicates the query and might need extra time and effort but > without having any significant benefits. > > > > TU group3: they are also interested in content of data. They also do risks > to be active attacker in order to have access to this data. So, DNS > information is no interest of him. He might also not have enough power to > sniff all traffic but he would also focus on what he can sniff. He might go > to a public wifi network and either sniffs the traffic and if not possible, > plays a role of active attacker to obtain data that he looks for. > > > > > > Now there is one question: > > 1- If we hide the identity of data originator that can be an end user or > a recursive resolver, does it help to protect users or the data exposes to a > certain DNS server? > > - For TU- group 1: No, if it falls into category of data gathering > process because an IP address does not bound to a particular user when using > NAT or proxy server. But the content of this data exposes the users data > > - For TU- group 2: No, the same reason for TU group 1 > > - For TU – group 3: No, again the same reason for TU group 1. > > The other reason is that, although that identity is for a particular node > (when, for instance, it uses IPv6) but the content of flow (data) would > expose information about user. This is of course not the flow that goes to > DNS server but it is flows that user start it after this query from DNS > server. Now the question is, could you hide data for any means of > randomization? > > Actually no, because the problem is that the next flows that are the flow > from end users to services are important and expose information about user, > its behavior, etc. > > > > So, in general, I do not think your approach works here since there is no > clear benefit on it. There is only more complexity or might be performance > issue because we are going to use several intermediate nodes to answer a > query asked by a user or a recursive server. > > In other words, DNS traffic does not carry anything interesting information > for TU 1,2,3 and only the content of data exchanged after DNS query is > important. Usually, when TU is located where it can sniff the DNS flows, it > can also sniff next flows that are of his interest(like your example of > ISPs). In this case, he skips all DNS flows. In other words, by only > analyzing a sample TLD’s traffic or a sample resolver’s traffic, TU cannot > gather any meaningful information because… > > 1- not all those queries are because a user wants to connect to a website > but it can be because this user wants to connect to his file server that has > a domain name, thereby he needs to do much more efforts to find out what was > the reason for this query to TLD or so. So this is what not of TU’s interest > when he has other easier way to gather this information > > 2- A website might have several different information and only user is > interested in one of that so no precise information can TU gather about a > user by he himself crawl a website. > > 3- This IP is a general IP address and doesn’t bound to any user > > 4- DNS domain names are only no meaningful names and he needs to crawl to > find the data or he can easily skip all DNS traffics and only focus on data > flows passed by different servers. > > > > I hope I could explain my point clearly here. > > I am sorry for a long message…. > > Best, > > Hosnieh > > > >>What I wrote about the onion routing has nothing to with > >> protecting the query content, but rather hiding the fact "who's asking". > >> Maybe I didn't articulate it clearly enough. Suppose there are three > >> traffic observers: the authoritative name server, forwarders, and bad > >> guy between the resolver and the auth. > >> > >> What happens when user asks a query? All know who's asked, or from > >> which network the query originated (because the resolver serves that > >> subnet). > >> What happens when a query is onion routed first? Bad guy may see only > >> part of the traffic, that he is able to sniff (parts of the query may > >> have different exit points). Authoritative name servers see the "exit > >> point" > >> resolver, but can't trace the query back to the first. Forwarders see > >> only previous resolver. > >> > >> So what have we gained? Neither auth/resolver/bad guy know _who_ asked > >> first, unless the query originator reveals himself with edns client > >> subnet or something voluntarily. > >> Is that worth of the extra cost of bouncing queries? It might be, I > >> don't know. > >> > >> So, to the "DNS queries carry no important information when the > >> surveillance actor have access to the whole traffic flow". That might > >> be true, but it's only a one scenario (ISP sniffing on its users). > >> For example a root server doesn't see any of your other traffic, but it > >> may still know, that you look up an embarrassing website every Monday > >> afternoon. > >> What I'm most worried about is TLDs, root servers and resolver > >> operators collecting traffic. This might be used for a good cause > >> (passive DNS, law enforcement), but also for a bad cause. > >> > >> We could continue on-list, if you're okay with it, because other people > >> might be interested. > >> > >> Best, > >> Marek > >> > >> On 27 August 2014 10:52, Hosnieh Rafiee <[email protected]> > >> wrote: > >> > Offlist > >> > > >> > > >> > Dear Merek, > >> > > >> > The problem is not how to protect DNS queries. But the problem is > >> that DNS queries carry no important information when the surveillance > >> actor have access to the whole traffic flow. So, he does not need to > >> sniff any DNS traffic at all. So why do we need to complicate our own > >> process and our own computation when we do not benefit from it? > >> > I hope you understand my point. > >> > Best, > >> > Hosnieh _______________________________________________ dns-privacy mailing list [email protected] https://www.ietf.org/mailman/listinfo/dns-privacy
