On 21/11/17 02:12, dawuud wrote: > Katzenpost is delivery by pull as well. Clients retreive messages from their > Provider. > This paper: > > [DUMMYINTERSECTION] Berthold, O., Langos, H., > "Dummy Traffic Against Long Term Intersection Attacks", > In the Proceedings of the PETS 2002, > <https://www.freehaven.net/anonbib/cache/langos02.pdf>. > > mentions that there are some constraints needed for the intersection attack > to work: > > a. the adversary is able to watch all user lines > b. the adversary is able to watch all Provider/server lines (see figure 1 in > paper) > c. the adversary knows how long it takes for a message to traverse the mix > network > d. messages belonging to a certain session have to have the same property > that allows the adversary to link them each together > > So in Katzenpost I don't think constraint "c" is met since we aren't using > synchronized batch mixing... but rather the Poisson mix strategy where > clients select the mix delay for each hop in the route which should be > different > for every sent message, even retransmits... although client route hop delays > are sampled > from an exponential distribution. The attacker can of course compute the > probability of a certain route delay having been choosen by a client.
Yes, I think the attack would still be possible given a probability distribution over delays instead of a fixed delay. My starting point here is the statistical disclosure attack (https://www.freehaven.net/anonbib/cache/statistical-disclosure.pdf), which says that if I'm watching the inputs and outputs of a round-based anonymity system, and I see Alice sending messages into the system, and for each round where Alice sends a message I see all the messages exiting the system, then starting from a uniform probability distribution over all possible recipients I can refine that distribution with every observation, to eventually discover Alice's communication partners. The longer I watch the system, the closer my distribution gets to Alice's real distribution. I think we can generalise that by replacing the rounds with a probability distribution over delays. So for each message Alice sends into the system, instead of having a set of output messages that definitely belong to the same round, I assign every future output message a probability of being Alice's message based on the delay. But the rest of the attack remains the same: I use the probabilities assigned to output messages to update the probability distribution over possible recipients, and over time that distribution approaches Alice's real distribution. Of course the Loopix authors know this, hence all the dummy traffic to prevent an attacker from knowing which users are actually sending and receiving messages. I think the big question is whether that provides a useful amount of protection at a reasonable level of overhead. > I also cannot think of a way that "d" is met... unless of course there > is collusion with the destination Provider. If I understand right, condition d just means there must be some set of messages you're interested in, and you must know which messages they are. In the statistical disclosure paper, the session is all messages sent by Alice and the goal is to identify the set of recipients. In the Berthold and Langos paper it's the other way round - the session is all messages sent by some unknown sender and the goal is to identify the sender. But either way, there's some set of interesting messages and the goal is to separate them from the uninteresting messages. Just as we can generalise from rounds to a probability distribution over delays, I wonder if we can generalise from a fixed set of interesting messages to a probability of being interesting. So in a system that uses dummy traffic, when I see Alice sending a message into the system, I assign it a probability of being a real message. Then I assign every future output message a probability of being the same message based on the delay. And I update the recipient's probability of being one of Alice's communication partners based on the product of those two probabilities. In theory the attack still works, but my distribution converges more slowly. On the other hand, the system is also more expensive to use. So it comes down to a practical question: how long can the system conceal sender-recipient relationships, at what cost in dummy traffic? I'd love to see some back-of-an-envelope calculations of this for Katzenpost. If Alice sends and receives n dummy bytes for each real byte, what does that do to the attacker's probability distribution over possible communication partners? How much longer does it take the attacker's distribution to converge? Cheers, Michael
0x9FC527CC.asc
Description: application/pgp-keys
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Messaging mailing list Messaging@moderncrypto.org https://moderncrypto.org/mailman/listinfo/messaging