So D's disparagement of my half-assed fix for our egregrious
mis-performance has lead me to thinking about what a real optimizer would
do.

I think we all agree that we are attempting to maximize performance for a
flow of operations given pre-existing layouts.  Transformations include
combining and rearranging operations and using sparse iterations where
applicable.  I expect that optimizing for intermediate memory use will
happen as a side effect.

As I see it, we have some properties of the matrices and vectors and
functions that are our inputs that we could use.  We also have a relatively
small set of higher order functions that include the primitives that we
already have and a few extras.

These include at least

*Vector qualities*

- isDense
- sparsity (fraction of non-zeros)
- skewedSparse (+1 if the non-zeros are near the beginning of a vector, -1
if near the end, 0 if roughly symmetric)
- size
- mergeUpdate (true if should update by collecting updates and merging back)
- randomAccessCost
- sequentialAccessCost
- randomUpdateCost

*Matrix qualities*

- rowMajor (true if adjacent elements in rows are near in memory)
- columnMajor (true if adjacent elements in columns are near in memory)
- randomOrder (true if adjacent elements are not near in memory)
- sequentialAccessCost (estimated cost of access in major order including
some estimate of caching effects)
- randomAccessCost
- mergeUpdate

*Functional properties*

(all as supported in Mahout now)

*Primitive logical operators*

(many of these are already supported in Mahout.  Some additions would be
required.)

- applyElementWise (apply a unary function to a matrix or vector)
- applyElementWise (apply a binary function to two matrixes or two vectors)
- applyElementsToRows (apply a binary function to elements of a vector and
corresponding rows of a matrix)
- applyElementsToColumns (apply a binary function to elements of a vector
and corresponding columns of a matrix)
- outerProduct (apply a binary function to all combinations of elements of
two vectors)
- applyRowWise (apply a vector function per row to get rows of a result
matrix)
- applyRowWise (apply a vector function per row to get values of a result
vector)
- applyRowWise (apply a vector function per row to get a list of results)
- applyColumnWise (apply a vector function per row to get rows of a result
matrix)
- applyColumnWise (apply a vector function per row to get values of a
result vector)
- applyColumnWise (apply a vector function per row to get a list of results)
- generalMatrixProduct (apply matrix product defined by two functions to
two matrices)
- generalMatrixProduct (apply matrix product defined by two functions to a
matrix and a vector)
- generalMatrixProduct (apply matrix product defined by two functions to a
vector and a matrix)
- transpose (of a matrix)

The physical operators will probably need to have a large number of
specializations of these, particularly the generalMatrixProduct functionals.

The optimizer should be able to estimate the run-time of various physical
plans at least down to ordering the different plans relatively correctly.

*Open questions* I have include:

- Is there a really good way to represent the physical plans such that
reduction to an execution plan is relatively easy?

- Can we have a code generator for the execution plan to avoid coding
dozens to hundreds of primitive executors?

- Is this enough to get good run time estimations?

- Is there a clever way we can gather sample run-times so that we can use
machine learning to learn the cost function?

Reply via email to