On Fri, Nov 2, 2012 at 3:13 AM, Matthias Friedrich <[email protected]> wrote:

> Hi,
>
> in our last discussion on refactoring we agreed (I think) to clean
> up CombineFn and move implementations to .fn.CombineFns. While playing
> with the API, I noticed that it's harder than necessary to get your
> generics in line. Example:
>
>   PGroupedTable<String, Integer> table = /* ... */;
>   table.combineValues(CombineFn.<String>SUM_INTS());
>
> You need to specify the String type or you get a compilation error.
> This leaks an implementation detail that isn't obvious to users; you
> need to know how CombineFn works to know that you have to specify the
> key type. And it's just plain ugly.
>
> How about using Aggregator more prominently by adding a
> combineValues(Aggregator<V> a) method to PGroupedTable? We could
> simplify the code and make it easier to understand:
>
>   PGroupedTable<String, Integer> table = /* ... */;
>   table.combineValues(Aggregators.SUM_INTS());
>
> Writing custom aggregations would be easier, too, as you don't need to
> decorate your aggregator with AggregatorCombineFn yourself anymore. We
> could go even further and remove combineValues(CombineFn), because it
> doesn't add much value anymore. If you need maximum flexibility (that
> is, your aggregation depends on the table's key somehow), you can
> still use parallelDo() with CombineFn directly. I don't think that's
> a common use case though.
>
> Here's what I propose:
>
>   1) Leave CombineFn and Aggregator in base
>   2) Remove all static factories (SUM_INTS etc.) from CombineFn
>   3) Create an Aggregators class in .fn with static factories
>      for everything removed from CombineFn
>   4) Make the implementation classes private, they just take up
>      lots of space in javaodcs
>   5) Bonus: Remove combineValues(CombineFn)
>

+1.


>
> One thing I don't understand: Is there really a need for
> AggregatorFactory? Am I overlooking something? It's not like we have
> to create Aggregator instances for each key -- that's what
> Aggregator.reset() is for.
>

Yeah, the reason for AggregatorFactory (which I think is evident from the
tests) is when you want to do aggregations over tuples, e.g., PTable<K,
Pair<Double, Double>> can't reuse the same SUM_DOUBLE aggregator for both
values of the Pair instance.


> Regards,
>   Matthias
>

Reply via email to