Comments inline. On Wed, Jul 27, 2016 at 12:14 PM, Tim Ellison <[email protected]> wrote:
> On 27/07/16 16:04, Jacob Wilder wrote: > > Hello everyone, > > I've tried a few things while working on this but would appreciate some > > suggestions on how to move forward. Or perhaps decide not to. > > > > I've pushed a very limited version of this up here: > > > https://github.com/jacobwil/incubator-pirk/compare/master...jacobwil:EnumForTypesInDataPartitioner > > > > In this limited version we move knowledge of the size of each type (and > > appropriate number of partitions) into the enum making it easier to reuse > > that logic without reproducing entire getNumPartitions and getBits > methods > > in each DataPartitioner implementation. I think this is worth keeping. > > Cool. It also gives some further typing rather than dealing with > strings everywhere. > > > Basically, I made PrimitiveTypeEnum which holds each of the primitive > > types, can provide their bit size, and their number of partitions. > Callers > > to PrimitiveTypePartitioner still provide types as strings which > > PrimitiveTypePartitioner converts into PrimitiveTypeEnums. > > > > In my original attempt I took this further and started to change > > interface DataPartitioner into DataPartitioner<T extends TypeEnum> and > > PrimitiveTypePartitioner into > PrimitiveTypePartitioner<PrimitiveTypeEnum>, > > etc. > > > > I managed to push the string-to-enum conversion most of the way up to > > LoadDataSchemas. That's when I started to wrestle with just how much more > > complex this was becoming. > > You're right that ideally the loader should be doing the string > conversion (i.e. move the safelyConvertTypeToEnum into the loader). > BTW if you do go there you should merge in the new schema and schema > loader code to avoid rework pain. > > I like the idea of having the DataPartitioner<T extends TypeEnum> too, > but without trying it I'll take your word that it gets hairy. If you > succeed you'd be able to change things like > toPartitions(Object obj, String typeAsString) > to drop the string arg, right? > > > While it's possible to have the instantiation of the DataPartitioner (as > > requested by the XML file) provide a factory method that converts > > string-represented-type information into the correct TypeEnum, I'd > > appreciate some feedback on if we consider this to truly be valuable. > > I'm still trying to get my head around what it means for non-primitive > partitioners... in particular what the onus will be on people wanting to > add a new partitioner type. > - implement a new TypeEnum > - update the factory/whatever for the schema loader > - anything else? > > Folks wanting to implement a non-primitive type partitioner just implement the DataPartitioner interface -- if TypeEnum is added as a part of the interface, then folks would have to implement it too. The DataPartitioner dictates how to parse a specific data element and, currently, the DataPartitioner for each element for a data schema is stored within the corresponding DataSchema object. There is room for improvement here - for example, a DataPartitionerRegistry could be created (similar to the DataSchemaRegistry) to avoid duplicate instantiations across multiple DataSchemas within the DataSchemaRegistry. The downside is that it will introduce some additional complication/overhead to the code. > Regards, > Tim > > >
