Re: Data partitioner questions

Suneel Marthi Tue, 16 Aug 2016 05:15:07 -0700

On Tue, Aug 16, 2016 at 7:17 AM, Tim Ellison <[email protected]> wrote:


> On 15/08/16 20:32, Ellison Anne Williams wrote:
> > Answers inline below
> >
> > On Mon, Aug 15, 2016 at 12:33 PM, Tim Ellison <[email protected]>
> wrote:
> >
> >> A couple of questions about the data partitioners...
> >>
> >> (1) Why do they split into, and reconstruct from, BigIntegers?
> >>
> >>   The types always get decomposed into primitives, and therefore the
> >> parts are always BigInteger values of bytes [1], and they are used as
> >> intValues [2] when using the exponent table.
> >>
> >> Why not use byte[]/int[]? and take the hit to convert to BigInteger only
> >> if going to modPowAbstraction?
> >>
> > The partitions always end up as exponents in a modular exponentiation
> > (that's there sole use). For modPow to function, the exponent must be a
> > BigInteger. Thus, they are converted to BigIntegers immediately upon
> > extraction.
>
> Understood, I'm thinking of the case where the exponents are maintained
> in the look-up table.  The in memory size of an Integer is much smaller
> than a BigInteger, so using BigIntegers to represent small values means
> we have less space in the cache.  It also makes the code a bit tidier as
> we can use autoboxing/unboxing of Integer.
>
> I may have a play and see what it looks like.
>
> >> (2) Has anyone tried to define the partitioning API as a set of type
> >> safe methods?
> >>
> >> I'm not sure if the current scheme defines the objects as type Object
> >> simply to allow for the data to be parsed Strings, but it does have the
> >> unfortunate effect of (a) requiring Java's primitives to be boxed, and
> >> (b) allowing for runtimes errors for mismatched types, e.g. I can write
> >> nonsense like:
> >>
> >> new PrimitiveTypePartitioner().toPartitions(123,
> >> PrimitiveTypePartitioner.CHAR)
> >>
> >> [1]
> >> https://github.com/apache/incubator-pirk/blob/master/
> >> src/main/java/org/apache/pirk/schema/data/partitioner/
> >> PrimitiveTypePartitioner.java#L302
> >> [2]
> >> https://github.com/apache/incubator-pirk/blob/master/
> >> src/main/java/org/apache/pirk/responder/wideskies/
> >> standalone/Responder.java#L197
> >>
> >>
> > So, right now in the code, the objects that are being partitioned are
> > pretty straightforward - primitive types, IPs, dates. However, you could
> > easily have a scenario where you are performing PIR over more complex
> > objects - say PIR over a graph, and what's being partitioned would be
> > vertices/edges/components/etc corresponding to more custom and complex
> > objects.... hence the choice of generic Object type for the
> DataPartitioner
> > interface.
>
> But complex types would still be types, so their partitioners would deal
> with the new types directly (i.e. you would not want to allow user to
> pass an Edge to the IP partitioner.  The interface would be specified
> using generics.
>

I agree with Tim that Generics should still work for complex types like
Subgraphs.
+1 to change this to use Generics.

>
> Regards,
> Tim
>
>

Re: Data partitioner questions

Reply via email to