I would like to hear Ellison Anne's opinion on this, but I think such a change has no security implications.
While it would be a definite change in how the algorithm operates, from a mathematical-security perspective this is no different than picking an r value that it simply enormous (recall that r is the number of responses that can be returned per query request period per search term). The value of r is a constrainer, not a constraint, in this system (i.e. we are forced to choose delta and b such that delta/b divides r, but we've got no restrictions on our choice of r). On Thu, Sep 22, 2016 at 11:23 AM, James Carr <[email protected]> wrote: > Hello all, > > I've been working through the Wideskies paper and Walter's slides. I > think I have an idea that would improve usability by a lot, but I want to > run it past you guys to see if it would work. > > Here's my understanding of how the Wideskies Responder works: > - It maintains the values Y_0, ..., Y_(r-1) which hold the encrypted data > returned by the algorithm. From a Comp Sci perspective, I see the > collection of these values functioning as, basically, a big encrypted byte > buffer holding our results. It also maintains a list of counters c_0, ..., > c_t, one for each possible value of hash function H. > - When a pair (T, D) comes in, we hash T to get H(T), and break D into > 8-bit chunks. We then combine each of those chunks into our Y_i values > using E_H(T), our encrypted value for the hash of T, which is chosen in > such a way that a) if we aren't interested in T, adding this chunk to Y_i > does not change the value it decrypts to (we essentially multiply by > encrypted 1); or b) if we are interested in T, E_H(T) has been chosen in > such a way that adding this chunk to Y_i will not overwrite any byte chunks > for *other* interesting terms that have also been written to Y_i. > - We then increment the counter c_T so that future items that hash to the > same value write to different Y_i, so that we do not overwrite the value we > just wrote. > - If *any *of these counters c_H(T) would exceed r due to some new pair > (T,D), then we can no longer ensure that we won't overwrite data we already > saw when we add the new D to Y_0, ..., Y_(r-1). From a Comp Sci > perspective, this looks like "our buffer is full," since we don't have room > for new data. Therefore, we have to terminate the algorithm and return our > list of Y's as they stand, without processing all of the data. This is true > even if most of the terms that hashed to H(T) were ones we weren't > interested in; in other words, if ANY of the terms are extremely common, > even if we aren't interested in them, we run the risk of being unable to > process all of the data. > > Please let me know if I'm off the mark on any of that :-) > > So my idea is as follows: In programming world, when we are processing a > large amount of data and we fill up our current write buffer, we just > output that buffer, make a new one, and continue. Why wouldn't that work > here? In other words, when we encounter a term (T, D) that would cause one > of the counters c_H(T) to exceed r, why couldn't we just push the current > Y_0, ..., Y_(r-1) entries onto a list, create a new set of entries Y'_0, > ..., Y'_(r-1), reset all the counters to 0, and continue processing (T,D) > as normal? Then the algorithm could return all of the buffers Y, Y', Y'', > etc. that it created instead of just Y, and we can ensure that we can > process all of the data. > > As far as I could tell, this requires giving the responder no additional > information about our query terms that they didn't already have. So I don't > think it would weaken the privacy of the algorithm at all, but please > correct me if I'm missing something. On the other hand I think it would > make PIRK much easier to use, since we don't run the risk of a query > failing if we don't understand enough about the data it's running over to > be able to tune r appropriately. I can also imagine this leading to some > performance improvements, since we could let PIRK choose r automatically so > that a result set could fit nicely within a hard disk page, network frame, > etc. > > What do you guys think? > > Regards, > -Ryan Carr >
