On Tue, Aug 16, 2016 at 7:17 AM, Tim Ellison <[email protected]> wrote:
> On 15/08/16 20:32, Ellison Anne Williams wrote: > > Answers inline below > > > > On Mon, Aug 15, 2016 at 12:33 PM, Tim Ellison <[email protected]> > wrote: > > > >> A couple of questions about the data partitioners... > >> > >> (1) Why do they split into, and reconstruct from, BigIntegers? > >> > >> The types always get decomposed into primitives, and therefore the > >> parts are always BigInteger values of bytes [1], and they are used as > >> intValues [2] when using the exponent table. > >> > >> Why not use byte[]/int[]? and take the hit to convert to BigInteger only > >> if going to modPowAbstraction? > >> > > The partitions always end up as exponents in a modular exponentiation > > (that's there sole use). For modPow to function, the exponent must be a > > BigInteger. Thus, they are converted to BigIntegers immediately upon > > extraction. > > Understood, I'm thinking of the case where the exponents are maintained > in the look-up table. The in memory size of an Integer is much smaller > than a BigInteger, so using BigIntegers to represent small values means > we have less space in the cache. It also makes the code a bit tidier as > we can use autoboxing/unboxing of Integer. > > I may have a play and see what it looks like. > > >> (2) Has anyone tried to define the partitioning API as a set of type > >> safe methods? > >> > >> I'm not sure if the current scheme defines the objects as type Object > >> simply to allow for the data to be parsed Strings, but it does have the > >> unfortunate effect of (a) requiring Java's primitives to be boxed, and > >> (b) allowing for runtimes errors for mismatched types, e.g. I can write > >> nonsense like: > >> > >> new PrimitiveTypePartitioner().toPartitions(123, > >> PrimitiveTypePartitioner.CHAR) > >> > >> [1] > >> https://github.com/apache/incubator-pirk/blob/master/ > >> src/main/java/org/apache/pirk/schema/data/partitioner/ > >> PrimitiveTypePartitioner.java#L302 > >> [2] > >> https://github.com/apache/incubator-pirk/blob/master/ > >> src/main/java/org/apache/pirk/responder/wideskies/ > >> standalone/Responder.java#L197 > >> > >> > > So, right now in the code, the objects that are being partitioned are > > pretty straightforward - primitive types, IPs, dates. However, you could > > easily have a scenario where you are performing PIR over more complex > > objects - say PIR over a graph, and what's being partitioned would be > > vertices/edges/components/etc corresponding to more custom and complex > > objects.... hence the choice of generic Object type for the > DataPartitioner > > interface. > > But complex types would still be types, so their partitioners would deal > with the new types directly (i.e. you would not want to allow user to > pass an Edge to the IP partitioner. The interface would be specified > using generics. > I agree with Tim that Generics should still work for complex types like Subgraphs. +1 to change this to use Generics. > > Regards, > Tim > >
