I was more in favor of more pragmatic approach. Fix is as you go, rather
than do all-cases-at-once.

In Teds last patch we see absolutely fine beginning of that in terms that
it introduces 3 separate algorithms for various cases where one operand is
always an SRM (or, symmetrically, a SCM), and another operand is one of
dense, sparse, SRM/SCM.

What it lacks, however, is to connect those 3 algorithms to permutations of
all of the above, and their transposed view.

I was thinking about a simple patch along the lines that would add
directional primitives isRowWise, isColumnWise on top of what we already
have (so transpose view would just flip those responses), and then build a
times algorightm which is single for pretty much... everything. I am not
even sure that we need cost interrogations, since once we figure proper
directionality, we'd be able to either rely on vector-vector stuff to sort
the cost-based algos, or, whenever we cannot operate with vectors directly,
we could make certain cost assumptions based on previous methods as well.
(e.g. if it is neither row-wise nor column-wise and not dense, then it is
ok to assume open hash costs).

so this is for times only. I think a tiny bit of thinking while
implementing will prove that the elementwise operations can also make
similar cost assumptions.

I am just afraid of work _too big_ to reasonably succeed. And soon.

One open question i had was,well, since the outlined approach avoids
logical plans still for in-core matrix, it would be difficult to optimize
the expressions in this context (i.e. figure out best-working _output_
structure).





On Mon, Jun 9, 2014 at 3:39 PM, Ted Dunning <[email protected]> wrote:

> So D's disparagement of my half-assed fix for our egregrious
> mis-performance has lead me to thinking about what a real optimizer would
> do.
>
> I think we all agree that we are attempting to maximize performance for a
> flow of operations given pre-existing layouts.  Transformations include
> combining and rearranging operations and using sparse iterations where
> applicable.  I expect that optimizing for intermediate memory use will
> happen as a side effect.
>
> As I see it, we have some properties of the matrices and vectors and
> functions that are our inputs that we could use.  We also have a relatively
> small set of higher order functions that include the primitives that we
> already have and a few extras.
>
> These include at least
>
> *Vector qualities*
>
> - isDense
> - sparsity (fraction of non-zeros)
> - skewedSparse (+1 if the non-zeros are near the beginning of a vector, -1
> if near the end, 0 if roughly symmetric)
> - size
> - mergeUpdate (true if should update by collecting updates and merging
> back)
> - randomAccessCost
> - sequentialAccessCost
> - randomUpdateCost
>
> *Matrix qualities*
>
> - rowMajor (true if adjacent elements in rows are near in memory)
> - columnMajor (true if adjacent elements in columns are near in memory)
> - randomOrder (true if adjacent elements are not near in memory)
> - sequentialAccessCost (estimated cost of access in major order including
> some estimate of caching effects)
> - randomAccessCost
> - mergeUpdate
>
> *Functional properties*
>
> (all as supported in Mahout now)
>
> *Primitive logical operators*
>
> (many of these are already supported in Mahout.  Some additions would be
> required.)
>
> - applyElementWise (apply a unary function to a matrix or vector)
> - applyElementWise (apply a binary function to two matrixes or two vectors)
> - applyElementsToRows (apply a binary function to elements of a vector and
> corresponding rows of a matrix)
> - applyElementsToColumns (apply a binary function to elements of a vector
> and corresponding columns of a matrix)
> - outerProduct (apply a binary function to all combinations of elements of
> two vectors)
> - applyRowWise (apply a vector function per row to get rows of a result
> matrix)
> - applyRowWise (apply a vector function per row to get values of a result
> vector)
> - applyRowWise (apply a vector function per row to get a list of results)
> - applyColumnWise (apply a vector function per row to get rows of a result
> matrix)
> - applyColumnWise (apply a vector function per row to get values of a
> result vector)
> - applyColumnWise (apply a vector function per row to get a list of
> results)
> - generalMatrixProduct (apply matrix product defined by two functions to
> two matrices)
> - generalMatrixProduct (apply matrix product defined by two functions to a
> matrix and a vector)
> - generalMatrixProduct (apply matrix product defined by two functions to a
> vector and a matrix)
> - transpose (of a matrix)
>
> The physical operators will probably need to have a large number of
> specializations of these, particularly the generalMatrixProduct
> functionals.
>
> The optimizer should be able to estimate the run-time of various physical
> plans at least down to ordering the different plans relatively correctly.
>
> *Open questions* I have include:
>
> - Is there a really good way to represent the physical plans such that
> reduction to an execution plan is relatively easy?
>
> - Can we have a code generator for the execution plan to avoid coding
> dozens to hundreds of primitive executors?
>
> - Is this enough to get good run time estimations?
>
> - Is there a clever way we can gather sample run-times so that we can use
> machine learning to learn the cost function?
>

Reply via email to