Re: Data partitioner questions

Tim Ellison Tue, 16 Aug 2016 04:18:18 -0700

On 15/08/16 20:32, Ellison Anne Williams wrote:
> Answers inline below
> 
> On Mon, Aug 15, 2016 at 12:33 PM, Tim Ellison <[email protected]> wrote:
> 
>> A couple of questions about the data partitioners...
>>
>> (1) Why do they split into, and reconstruct from, BigIntegers?
>>
>>   The types always get decomposed into primitives, and therefore the
>> parts are always BigInteger values of bytes [1], and they are used as
>> intValues [2] when using the exponent table.
>>
>> Why not use byte[]/int[]? and take the hit to convert to BigInteger only
>> if going to modPowAbstraction?
>>
> The partitions always end up as exponents in a modular exponentiation
> (that's there sole use). For modPow to function, the exponent must be a
> BigInteger. Thus, they are converted to BigIntegers immediately upon
> extraction.


Understood, I'm thinking of the case where the exponents are maintained
in the look-up table.  The in memory size of an Integer is much smaller
than a BigInteger, so using BigIntegers to represent small values means
we have less space in the cache.  It also makes the code a bit tidier as
we can use autoboxing/unboxing of Integer.

I may have a play and see what it looks like.

>> (2) Has anyone tried to define the partitioning API as a set of type
>> safe methods?
>>
>> I'm not sure if the current scheme defines the objects as type Object
>> simply to allow for the data to be parsed Strings, but it does have the
>> unfortunate effect of (a) requiring Java's primitives to be boxed, and
>> (b) allowing for runtimes errors for mismatched types, e.g. I can write
>> nonsense like:
>>
>> new PrimitiveTypePartitioner().toPartitions(123,
>> PrimitiveTypePartitioner.CHAR)
>>
>> [1]
>> https://github.com/apache/incubator-pirk/blob/master/
>> src/main/java/org/apache/pirk/schema/data/partitioner/
>> PrimitiveTypePartitioner.java#L302
>> [2]
>> https://github.com/apache/incubator-pirk/blob/master/
>> src/main/java/org/apache/pirk/responder/wideskies/
>> standalone/Responder.java#L197
>>
>>
> So, right now in the code, the objects that are being partitioned are
> pretty straightforward - primitive types, IPs, dates. However, you could
> easily have a scenario where you are performing PIR over more complex
> objects - say PIR over a graph, and what's being partitioned would be
> vertices/edges/components/etc corresponding to more custom and complex
> objects.... hence the choice of generic Object type for the DataPartitioner
> interface.

But complex types would still be types, so their partitioners would deal
with the new types directly (i.e. you would not want to allow user to
pass an Edge to the IP partitioner.  The interface would be specified
using generics.

Regards,
Tim

Re: Data partitioner questions

Reply via email to