So D's disparagement of my half-assed fix for our egregrious mis-performance has lead me to thinking about what a real optimizer would do.
I think we all agree that we are attempting to maximize performance for a flow of operations given pre-existing layouts. Transformations include combining and rearranging operations and using sparse iterations where applicable. I expect that optimizing for intermediate memory use will happen as a side effect. As I see it, we have some properties of the matrices and vectors and functions that are our inputs that we could use. We also have a relatively small set of higher order functions that include the primitives that we already have and a few extras. These include at least *Vector qualities* - isDense - sparsity (fraction of non-zeros) - skewedSparse (+1 if the non-zeros are near the beginning of a vector, -1 if near the end, 0 if roughly symmetric) - size - mergeUpdate (true if should update by collecting updates and merging back) - randomAccessCost - sequentialAccessCost - randomUpdateCost *Matrix qualities* - rowMajor (true if adjacent elements in rows are near in memory) - columnMajor (true if adjacent elements in columns are near in memory) - randomOrder (true if adjacent elements are not near in memory) - sequentialAccessCost (estimated cost of access in major order including some estimate of caching effects) - randomAccessCost - mergeUpdate *Functional properties* (all as supported in Mahout now) *Primitive logical operators* (many of these are already supported in Mahout. Some additions would be required.) - applyElementWise (apply a unary function to a matrix or vector) - applyElementWise (apply a binary function to two matrixes or two vectors) - applyElementsToRows (apply a binary function to elements of a vector and corresponding rows of a matrix) - applyElementsToColumns (apply a binary function to elements of a vector and corresponding columns of a matrix) - outerProduct (apply a binary function to all combinations of elements of two vectors) - applyRowWise (apply a vector function per row to get rows of a result matrix) - applyRowWise (apply a vector function per row to get values of a result vector) - applyRowWise (apply a vector function per row to get a list of results) - applyColumnWise (apply a vector function per row to get rows of a result matrix) - applyColumnWise (apply a vector function per row to get values of a result vector) - applyColumnWise (apply a vector function per row to get a list of results) - generalMatrixProduct (apply matrix product defined by two functions to two matrices) - generalMatrixProduct (apply matrix product defined by two functions to a matrix and a vector) - generalMatrixProduct (apply matrix product defined by two functions to a vector and a matrix) - transpose (of a matrix) The physical operators will probably need to have a large number of specializations of these, particularly the generalMatrixProduct functionals. The optimizer should be able to estimate the run-time of various physical plans at least down to ordering the different plans relatively correctly. *Open questions* I have include: - Is there a really good way to represent the physical plans such that reduction to an execution plan is relatively easy? - Can we have a code generator for the execution plan to avoid coding dozens to hundreds of primitive executors? - Is this enough to get good run time estimations? - Is there a clever way we can gather sample run-times so that we can use machine learning to learn the cost function?
