Thank you! This is what I understood and I’m doing a little dance for joy (in 
my mind).  This makes sparseness all encompassing, at least for sequential Int 
keys. 

However Anand has found several math ops that don’t work.

I’ll write up a few tests for transpose and multiply at least since these are 
used in cooccurrence. And I’ll be happy to implement something that changes 
nrow in an immutable R-like way. Anand and Ted suggested rbind of 
drmParallelizeEmpty with added row cardinality. This would really only change 
nrow of the resulting CheckPointedDrm, it would not alter the rdd.




On Jul 21, 2014, at 1:12 PM, Dmitriy Lyubimov <[email protected]> wrote:

"missing" rows are only valid in context of int-keyed matrices and physical 
transposition operations. These are the only that may depend on it, since 
obviously one can't define "missing-ness" for something that is String-keyed.

So the only thing that may fail because of "missing-ness" effect is probably 
physical transposition operator (we don't have test for such case, so maybe 
there's a bug in that case). Everything else should work. 

And no, i suppose it is ok to have "missing" rows even in case of int-keyed 
matrices. 

there's one thing that you probably should be aware in this context though: 
many algorithms don't survive empty (row-less) partitions, in whatever way they 
may come to be. Other than that, I don't feel every row must be present -- even 
if there's implied order of the rows.


On Mon, Jul 21, 2014 at 12:22 PM, Pat Ferrel <[email protected]> wrote:
I appreciate that you can’t read all the back and forth Dmitriy hence the 
private email. Please disregard all other code or talk in the thread for the 
moment.

Does a DRM need to have a row for every sequential row key from 0 to nrow-1 ? 
Can there be missing row keys in the sequence and will they be treated as {}, 
an all zero row? In terms of the rdd in the CheckpointedDrm these “missing” 
rows will not have a corresponding n => {}, they will just not exist in the 
rdd. This will happen when a row is “missing” from the DRM but the true 
cardinality is known and passed in to the CheckpointedDrm constructor.

Will R-like operations on these matrices work correctly. Will A.t %*% A and A + 
1 work correctly?

The answer is no, but _should_ they work correctly?



Reply via email to