Seem like a good idea. The current use for aggregation seems pretty limited due to being non-distributed. The drm and rdd allow for easy construction of closures for processing blocks (like drm.mapBlock) but having an api to plug in closures or functions for aggregations/accumulators might be a nice piece of syntactic sugar.
I’ve only seen the Scala aggregation stuff used in tests to compare the results of small in core matrix ops to distributed ones. There are separate Matrix methods (sometimes using aggregations) and DRM methods, one is non-distributed the other distributed. DrmLike currently seems to only support row-wise mapBlock but Dmitriy may know better. On Jun 14, 2014, at 6:45 PM, Ted Dunning <[email protected]> wrote: In math-scala/src/main/scala/org/apache/mahout/math/scalabindings/MatrixOps.scala: > @@ -188,8 +188,8 @@ object MatrixOps { > def apply(f: Vector): Double = f.sum > } > > - private def vectorCountFunc = new VectorFunction { > - def apply(f: Vector): Double = f.aggregate(Functions.PLUS, > Functions.greater(0)) > + private def vectorCountNonZeroElementsFunc = new VectorFunction { > + def apply(f: Vector): Double = f.aggregate(Functions.PLUS, > Functions.notEqual(0)) The issue I have is with the rowAggregation and columnAggregation API. It enforces row by row evaluation. A map-reduce API could evaluate in many different orders and could iterate by rows or by columns for either aggregation and wouldn't require the a custom VectorFunction for simple aggregations. — Reply to this email directly or view it on GitHub.
