also, note that parallelizeEmpty() does not create anything but a standard int-keyed matrix with all rows indexed accordingly. That means it cannot be r-bound with something that is not int-keyed (but perhaps it could be bound after intermediate map-block for keys).
On Mon, Jul 21, 2014 at 1:42 PM, Pat Ferrel <[email protected]> wrote: > Thank you! This is what I understood and I’m doing a little dance for joy > (in my mind). This makes sparseness all encompassing, at least for > sequential Int keys. > > However Anand has found several math ops that don’t work. > > I’ll write up a few tests for transpose and multiply at least since these > are used in cooccurrence. And I’ll be happy to implement something that > changes nrow in an immutable R-like way. Anand and Ted suggested rbind > of drmParallelizeEmpty with added row cardinality. This would really only > change nrow of the resulting CheckPointedDrm, it would not alter the rdd. > > > > > On Jul 21, 2014, at 1:12 PM, Dmitriy Lyubimov <[email protected]> wrote: > > "missing" rows are only valid in context of int-keyed matrices and > physical transposition operations. These are the only that may depend on > it, since obviously one can't define "missing-ness" for something that is > String-keyed. > > So the only thing that may fail because of "missing-ness" effect is > probably physical transposition operator (we don't have test for such case, > so maybe there's a bug in that case). Everything else should work. > > And no, i suppose it is ok to have "missing" rows even in case of > int-keyed matrices. > > there's one thing that you probably should be aware in this context > though: many algorithms don't survive empty (row-less) partitions, in > whatever way they may come to be. Other than that, I don't feel every row > must be present -- even if there's implied order of the rows. > > > On Mon, Jul 21, 2014 at 12:22 PM, Pat Ferrel <[email protected]> wrote: > >> I appreciate that you can’t read all the back and forth Dmitriy hence the >> private email. Please disregard all other code or talk in the thread for >> the moment. >> >> Does a DRM need to have a row for every sequential row key from 0 to >> nrow-1 ? Can there be missing row keys in the sequence and will they be >> treated as {}, an all zero row? In terms of the rdd in the CheckpointedDrm >> these “missing” rows will not have a corresponding n => {}, they will just >> not exist in the rdd. This will happen when a row is “missing” from the DRM >> but the true cardinality is known and passed in to the CheckpointedDrm >> constructor. >> >> Will R-like operations on these matrices work correctly. Will A.t %*% A >> and A + 1 work correctly? >> >> The answer is no, but _should_ they work correctly? >> >> > >
