Re: Idea for handling result matrix overflows

```I would like to hear Ellison Anne's opinion on this, but I think such a
change has no security implications.```
```
While it would be a definite change in how the algorithm operates, from a
mathematical-security perspective this is no different than picking an r
value that it simply enormous (recall that r is the number of responses
that can be returned per query request period per search term). The value
of r is a constrainer, not a constraint, in this system (i.e. we are forced
to choose delta and b such that delta/b divides r, but we've got no
restrictions on our choice of r).

On Thu, Sep 22, 2016 at 11:23 AM, James Carr <jryancarrp...@gmail.com>
wrote:

> Hello all,
>
>   I've been working through the Wideskies paper and Walter's slides. I
> think I have an idea that would improve usability by a lot, but I want to
> run it past you guys to see if it would work.
>
> Here's my understanding of how the Wideskies Responder works:
> - It maintains the values Y_0, ..., Y_(r-1) which hold the encrypted data
> returned by the algorithm. From a Comp Sci perspective, I see the
> collection of these values functioning as, basically, a big encrypted byte
> buffer holding our results. It also maintains a list of counters c_0, ...,
> c_t, one for each possible value of hash function H.
> - When a pair (T, D) comes in, we hash T to get H(T), and break D into
> 8-bit chunks. We then combine each of those chunks into our Y_i values
> using E_H(T), our encrypted value for the hash of T, which is chosen in
> such a way that a) if we aren't interested in T, adding this chunk to Y_i
> does not change the value it decrypts to (we essentially multiply by
> encrypted 1); or b) if we are interested in T, E_H(T) has been chosen in
> such a way that adding this chunk to Y_i will not overwrite any byte chunks
> for *other* interesting terms that have also been written to Y_i.
> - We then increment the counter c_T so that future items that hash to the
> same value write to different Y_i, so that we do not overwrite the value we
> just wrote.
> - If *any *of these counters c_H(T) would exceed r due to some new pair
> (T,D), then we can no longer ensure that we won't overwrite data we already
> saw when we add the new D to Y_0, ..., Y_(r-1). From a Comp Sci
> perspective, this looks like "our buffer is full," since we don't have room
> for new data. Therefore, we have to terminate the algorithm and return our
> list of Y's as they stand, without processing all of the data. This is true
> even if most of the terms that hashed to H(T) were ones we weren't
> interested in; in other words, if ANY of the terms are extremely common,
> even if we aren't interested in them, we run the risk of being unable to
> process all of the data.
>
> Please let me know if I'm off the mark on any of that :-)
>
> So my idea is as follows: In programming world, when we are processing a
> large amount of data and we fill up our current write buffer, we just
> output that buffer, make a new one, and continue. Why wouldn't that work
> here? In other words, when we encounter a term (T, D) that would cause one
> of the counters c_H(T) to exceed r, why couldn't we just push the current
> Y_0, ..., Y_(r-1) entries onto a list, create a new set of entries Y'_0,
> ..., Y'_(r-1), reset all the counters to 0, and continue processing (T,D)
> as normal? Then the algorithm could return all of the buffers Y, Y', Y'',
> etc. that it created instead of just Y, and we can ensure that we can
> process all of the data.
>
> As far as I could tell, this requires giving the responder no additional
> information about our query terms that they didn't already have. So I don't
> think it would weaken the privacy of the algorithm at all, but please
> correct me if I'm missing something. On the other hand I think it would
> make PIRK much easier to use, since we don't run the risk of a query
> failing if we don't understand enough about the data it's running over to
> be able to tune r appropriately. I can also imagine this leading to some
> performance improvements, since we could let PIRK choose r automatically so
> that a result set could fit nicely within a hard disk page, network frame,
> etc.
>
> What do you guys think?
>
> Regards,
> -Ryan Carr
>
```