Aggregation api

Pat Ferrel Sun, 15 Jun 2014 16:56:41 -0700

Seem like a good idea. The current use for aggregation seems pretty limited due 
to being non-distributed. The drm and rdd allow for easy construction of 
closures for processing blocks (like drm.mapBlock) but having an api to plug in 
closures or functions for aggregations/accumulators might be a nice piece of 
syntactic sugar.

I’ve only seen the Scala aggregation stuff used in tests to compare the results 
of small in core matrix ops to distributed ones. There are separate Matrix 
methods (sometimes using aggregations) and DRM methods, one is non-distributed 
the other distributed. DrmLike currently seems to only support row-wise 
mapBlock but Dmitriy may know better.

On Jun 14, 2014, at 6:45 PM, Ted Dunning <[email protected]> wrote:

In 
math-scala/src/main/scala/org/apache/mahout/math/scalabindings/MatrixOps.scala:

> @@ -188,8 +188,8 @@ object MatrixOps {
>      def apply(f: Vector): Double = f.sum
>    }
>  
> -  private def vectorCountFunc = new VectorFunction {
> -    def apply(f: Vector): Double = f.aggregate(Functions.PLUS, 
> Functions.greater(0))
> +  private def vectorCountNonZeroElementsFunc = new VectorFunction {
> +    def apply(f: Vector): Double = f.aggregate(Functions.PLUS, 
> Functions.notEqual(0))
The issue I have is with the rowAggregation and columnAggregation API. It 
enforces row by row evaluation. A map-reduce API could evaluate in many 
different orders and could iterate by rows or by columns for either aggregation 
and wouldn't require the a custom VectorFunction for simple aggregations.

—
Reply to this email directly or view it on GitHub.

Aggregation api

Reply via email to