Re: Why does AggregateUnaryOperator need aa AggregateOperator?

Matthias Boehm Tue, 20 Jul 2021 03:22:03 -0700

thanks for asking - let's look at a concrete example: rowSums(X), whichis compiled to the instruction opcode "uark+" (unary aggregate row kahanaddition). The related aggregate operator is constructed as follows:


agg = new AggregateOperator(0,
    KahanPlus.getKahanPlusFnObject(),
    CorrectionLocationType.LASTCOLUMN);
aggun = new AggregateUnaryOperator(agg,
    ReduceCol.getReduceColFnObject(), numThreads);

The aggregate operator reflects the operation (kahan addition, andindicates any correction columns/rows if needed - for row kahan additionits one additional temporay column). The AggregateUnaryOperator thendescribes the direction (axis) of aggregation and additional meta data.

This design stems from a time, where SystemML had a genericimplementation of such default aggregations, leveraging the aggregationand indexing operators to obtain the aggregate the values. However, formany years now we have a dedicate kernel library 'LibMatrixAgg' whichimplements efficient dense and sparse kernels of all operators - sothese operators are mainly used to communicate the exact operationparameters down to the runtime. Some of the aggregation operators arestill internally used (e.g., Kahan addition), while the reductions aremostly not, so we could condense this to a single AggregateOperator. Thereason there is a AggregateUnaryOperator is that there was also aAggregateBinaryOperator as used for matrix multiply, but again we addeda 'LibMatrixMult' kernel library for performance.

For your countDistinct operations, I would recommend to to follow theexample of existing operators (like row/colSums). For the sparkoperations, only the map needs the full aggregation meta data as itdetermines the output indexes of blocks and thus what and how aggStableaggregates values for equal keys.


Regards,
Matthias

On 7/20/2021 4:37 AM, Badrul Chowdhury wrote:

Hi,

I am having difficulty understanding why the AggregateUnaryOperator would
need a corresponding AggregateOperator in the constructor. I am confused by
the intent behind the following:

(Lines 69 and 73 in
/Users/badrulchowdhury/code/systemds/src/main/java/org/apache/sysds/runtime/instructions/spark/AggregateUnarySPInstruction.java)
InstructionUtils.deriveAggregateOperatorOpcode(opcode);
AggregateOperator aop = InstructionUtils.parseAggregateOperator(aopcode,
corrLoc.toString());

If I had to guess, I would say that UnaryAggregateOperator is for a single
matrix block (unary) whereas AggregateOperator is for combining results
from multiple matrix blocks (binary). At least that is what the following
snippet from the processMatrixAggregate() method in the same file (Line
108) seems to suggest:

    JavaRDD<MatrixBlock> out2 = out.map(
new RDDUAggFunction2(*auop*, mc.getBlocksize()));
MatrixBlock out3 = RDDAggregateUtils.aggStable(out2, *aggop*);

Would really appreciate it if somebody could confirm my suspicions or point
me in the right direction. Thanks in advance!

-Badrul

Re: Why does AggregateUnaryOperator need aa AggregateOperator?

Reply via email to