On 27/07/16 16:04, Jacob Wilder wrote: > Hello everyone, > I've tried a few things while working on this but would appreciate some > suggestions on how to move forward. Or perhaps decide not to. > > I've pushed a very limited version of this up here: > https://github.com/jacobwil/incubator-pirk/compare/master...jacobwil:EnumForTypesInDataPartitioner > > In this limited version we move knowledge of the size of each type (and > appropriate number of partitions) into the enum making it easier to reuse > that logic without reproducing entire getNumPartitions and getBits methods > in each DataPartitioner implementation. I think this is worth keeping.
Cool. It also gives some further typing rather than dealing with strings everywhere. > Basically, I made PrimitiveTypeEnum which holds each of the primitive > types, can provide their bit size, and their number of partitions. Callers > to PrimitiveTypePartitioner still provide types as strings which > PrimitiveTypePartitioner converts into PrimitiveTypeEnums. > > In my original attempt I took this further and started to change > interface DataPartitioner into DataPartitioner<T extends TypeEnum> and > PrimitiveTypePartitioner into PrimitiveTypePartitioner<PrimitiveTypeEnum>, > etc. > > I managed to push the string-to-enum conversion most of the way up to > LoadDataSchemas. That's when I started to wrestle with just how much more > complex this was becoming. You're right that ideally the loader should be doing the string conversion (i.e. move the safelyConvertTypeToEnum into the loader). BTW if you do go there you should merge in the new schema and schema loader code to avoid rework pain. I like the idea of having the DataPartitioner<T extends TypeEnum> too, but without trying it I'll take your word that it gets hairy. If you succeed you'd be able to change things like toPartitions(Object obj, String typeAsString) to drop the string arg, right? > While it's possible to have the instantiation of the DataPartitioner (as > requested by the XML file) provide a factory method that converts > string-represented-type information into the correct TypeEnum, I'd > appreciate some feedback on if we consider this to truly be valuable. I'm still trying to get my head around what it means for non-primitive partitioners... in particular what the onus will be on people wanting to add a new partitioner type. - implement a new TypeEnum - update the factory/whatever for the schema loader - anything else? Regards, Tim
