On Fri, Nov 2, 2012 at 3:13 AM, Matthias Friedrich <[email protected]> wrote:
> Hi, > > in our last discussion on refactoring we agreed (I think) to clean > up CombineFn and move implementations to .fn.CombineFns. While playing > with the API, I noticed that it's harder than necessary to get your > generics in line. Example: > > PGroupedTable<String, Integer> table = /* ... */; > table.combineValues(CombineFn.<String>SUM_INTS()); > > You need to specify the String type or you get a compilation error. > This leaks an implementation detail that isn't obvious to users; you > need to know how CombineFn works to know that you have to specify the > key type. And it's just plain ugly. > > How about using Aggregator more prominently by adding a > combineValues(Aggregator<V> a) method to PGroupedTable? We could > simplify the code and make it easier to understand: > > PGroupedTable<String, Integer> table = /* ... */; > table.combineValues(Aggregators.SUM_INTS()); > > Writing custom aggregations would be easier, too, as you don't need to > decorate your aggregator with AggregatorCombineFn yourself anymore. We > could go even further and remove combineValues(CombineFn), because it > doesn't add much value anymore. If you need maximum flexibility (that > is, your aggregation depends on the table's key somehow), you can > still use parallelDo() with CombineFn directly. I don't think that's > a common use case though. > > Here's what I propose: > > 1) Leave CombineFn and Aggregator in base > 2) Remove all static factories (SUM_INTS etc.) from CombineFn > 3) Create an Aggregators class in .fn with static factories > for everything removed from CombineFn > 4) Make the implementation classes private, they just take up > lots of space in javaodcs > 5) Bonus: Remove combineValues(CombineFn) > +1. > > One thing I don't understand: Is there really a need for > AggregatorFactory? Am I overlooking something? It's not like we have > to create Aggregator instances for each key -- that's what > Aggregator.reset() is for. > Yeah, the reason for AggregatorFactory (which I think is evident from the tests) is when you want to do aggregations over tuples, e.g., PTable<K, Pair<Double, Double>> can't reuse the same SUM_DOUBLE aggregator for both values of the Pair instance. > Regards, > Matthias >
