Comments inline.

On Wed, Jul 27, 2016 at 12:14 PM, Tim Ellison <[email protected]> wrote:

> On 27/07/16 16:04, Jacob Wilder wrote:
> > Hello everyone,
> > I've tried a few things while working on this but would appreciate some
> > suggestions on how to move forward. Or perhaps decide not to.
> >
> > I've pushed a very limited version of this up here:
> >
> https://github.com/jacobwil/incubator-pirk/compare/master...jacobwil:EnumForTypesInDataPartitioner
> >
> > In this limited version we move knowledge of the size of each type (and
> > appropriate number of partitions) into the enum making it easier to reuse
> > that logic without reproducing entire getNumPartitions and getBits
> methods
> > in each DataPartitioner implementation. I think this is worth keeping.
>
> Cool.  It also gives some further typing rather than dealing with
> strings everywhere.
>
> > Basically, I made PrimitiveTypeEnum which holds each of the primitive
> > types, can provide their bit size, and their number of partitions.
> Callers
> > to PrimitiveTypePartitioner still provide types as strings which
> > PrimitiveTypePartitioner converts into PrimitiveTypeEnums.
> >
> > In my original attempt I took this further and started to change
> > interface DataPartitioner into DataPartitioner<T extends TypeEnum> and
> > PrimitiveTypePartitioner into
> PrimitiveTypePartitioner<PrimitiveTypeEnum>,
> > etc.
> >
> > I managed to push the string-to-enum conversion most of the way up to
> > LoadDataSchemas. That's when I started to wrestle with just how much more
> > complex this was becoming.
>
> You're right that ideally the loader should be doing the string
> conversion (i.e. move the safelyConvertTypeToEnum into the loader).
> BTW if you do go there you should merge in the new schema and schema
> loader code to avoid rework pain.
>
> I like the idea of having the DataPartitioner<T extends TypeEnum> too,
> but without trying it I'll take your word that it gets hairy.  If you
> succeed you'd be able to change things like
>   toPartitions(Object obj, String typeAsString)
> to drop the string arg, right?
>
> > While it's possible to have the instantiation of the DataPartitioner (as
> > requested by the XML file) provide a factory method that converts
> > string-represented-type information into the correct TypeEnum, I'd
> > appreciate some feedback on if we consider this to truly be valuable.
>
> I'm still trying to get my head around what it means for non-primitive
> partitioners... in particular what the onus will be on people wanting to
> add a new partitioner type.
>  - implement a new TypeEnum
>  - update the factory/whatever for the schema loader
>  - anything else?
>
>
Folks wanting to implement a non-primitive type partitioner just implement
the DataPartitioner interface -- if TypeEnum is added as a part of the
interface, then folks would have to implement it too.

The DataPartitioner dictates how to parse a specific data element and,
currently, the DataPartitioner for each element for a data schema is stored
within the corresponding DataSchema object. There is room for improvement
here - for example, a DataPartitionerRegistry could be created (similar to
the DataSchemaRegistry) to avoid duplicate instantiations across multiple
DataSchemas within the DataSchemaRegistry. The downside is that it will
introduce some additional complication/overhead to the code.


> Regards,
> Tim
>
>
>

Reply via email to