I was more in favor of more pragmatic approach. Fix is as you go, rather than do all-cases-at-once.
In Teds last patch we see absolutely fine beginning of that in terms that it introduces 3 separate algorithms for various cases where one operand is always an SRM (or, symmetrically, a SCM), and another operand is one of dense, sparse, SRM/SCM. What it lacks, however, is to connect those 3 algorithms to permutations of all of the above, and their transposed view. I was thinking about a simple patch along the lines that would add directional primitives isRowWise, isColumnWise on top of what we already have (so transpose view would just flip those responses), and then build a times algorightm which is single for pretty much... everything. I am not even sure that we need cost interrogations, since once we figure proper directionality, we'd be able to either rely on vector-vector stuff to sort the cost-based algos, or, whenever we cannot operate with vectors directly, we could make certain cost assumptions based on previous methods as well. (e.g. if it is neither row-wise nor column-wise and not dense, then it is ok to assume open hash costs). so this is for times only. I think a tiny bit of thinking while implementing will prove that the elementwise operations can also make similar cost assumptions. I am just afraid of work _too big_ to reasonably succeed. And soon. One open question i had was,well, since the outlined approach avoids logical plans still for in-core matrix, it would be difficult to optimize the expressions in this context (i.e. figure out best-working _output_ structure). On Mon, Jun 9, 2014 at 3:39 PM, Ted Dunning <[email protected]> wrote: > So D's disparagement of my half-assed fix for our egregrious > mis-performance has lead me to thinking about what a real optimizer would > do. > > I think we all agree that we are attempting to maximize performance for a > flow of operations given pre-existing layouts. Transformations include > combining and rearranging operations and using sparse iterations where > applicable. I expect that optimizing for intermediate memory use will > happen as a side effect. > > As I see it, we have some properties of the matrices and vectors and > functions that are our inputs that we could use. We also have a relatively > small set of higher order functions that include the primitives that we > already have and a few extras. > > These include at least > > *Vector qualities* > > - isDense > - sparsity (fraction of non-zeros) > - skewedSparse (+1 if the non-zeros are near the beginning of a vector, -1 > if near the end, 0 if roughly symmetric) > - size > - mergeUpdate (true if should update by collecting updates and merging > back) > - randomAccessCost > - sequentialAccessCost > - randomUpdateCost > > *Matrix qualities* > > - rowMajor (true if adjacent elements in rows are near in memory) > - columnMajor (true if adjacent elements in columns are near in memory) > - randomOrder (true if adjacent elements are not near in memory) > - sequentialAccessCost (estimated cost of access in major order including > some estimate of caching effects) > - randomAccessCost > - mergeUpdate > > *Functional properties* > > (all as supported in Mahout now) > > *Primitive logical operators* > > (many of these are already supported in Mahout. Some additions would be > required.) > > - applyElementWise (apply a unary function to a matrix or vector) > - applyElementWise (apply a binary function to two matrixes or two vectors) > - applyElementsToRows (apply a binary function to elements of a vector and > corresponding rows of a matrix) > - applyElementsToColumns (apply a binary function to elements of a vector > and corresponding columns of a matrix) > - outerProduct (apply a binary function to all combinations of elements of > two vectors) > - applyRowWise (apply a vector function per row to get rows of a result > matrix) > - applyRowWise (apply a vector function per row to get values of a result > vector) > - applyRowWise (apply a vector function per row to get a list of results) > - applyColumnWise (apply a vector function per row to get rows of a result > matrix) > - applyColumnWise (apply a vector function per row to get values of a > result vector) > - applyColumnWise (apply a vector function per row to get a list of > results) > - generalMatrixProduct (apply matrix product defined by two functions to > two matrices) > - generalMatrixProduct (apply matrix product defined by two functions to a > matrix and a vector) > - generalMatrixProduct (apply matrix product defined by two functions to a > vector and a matrix) > - transpose (of a matrix) > > The physical operators will probably need to have a large number of > specializations of these, particularly the generalMatrixProduct > functionals. > > The optimizer should be able to estimate the run-time of various physical > plans at least down to ordering the different plans relatively correctly. > > *Open questions* I have include: > > - Is there a really good way to represent the physical plans such that > reduction to an execution plan is relatively easy? > > - Can we have a code generator for the execution plan to avoid coding > dozens to hundreds of primitive executors? > > - Is this enough to get good run time estimations? > > - Is there a clever way we can gather sample run-times so that we can use > machine learning to learn the cost function? >
