Sorry but my question keep getting lost in the back and forth, no one reads that much email.
Oh, it’s working. The conversion to dense is saving the day in #1 I think. Where it is using nrow to create enough rows rather than looking in the rdd for row keys—just a guess. I think it was you that said dense or sparse the math should produce the same result. One or the other is a bug, right? I am going to add some matrix multiply tests that work on rdd backed objects. The current tests could use some additions. I suspect that multiply and transpose will work correctly with non-existant rows/columns but still need the designers to come out on one side or the other. If row keys need to be sequential and unbroken, this is s big deal and new to me. On Jul 19, 2014, at 10:09 PM, Anand Avati <[email protected]> wrote: On Sat, Jul 19, 2014 at 9:06 PM, Pat Ferrel <[email protected]> wrote: > Using methods instead of symbolic ops returns different types so methods > work, ops don’t. If math is math, they should do the smae thing so I’d like > to know what it is supposed to do, can you please allow me to ask specific > people the question? > I don't see how I'm coming in the way of anybody else from replying. By posting to a public mailing list, you by definition allow anybody to review and comment. test("plus one"){ > val a = dense( > (1, 1), > (0, 0)) > > val drmA1 = drmParallelize(m = a, numPartitions = 2) > > // modified to return a new CheckpointedDrm so maintains immutability > but still only increases the row cardinality > // by returning new CheckpointedDrmSpark[K](rdd, nrow + n, ncol, > _cacheStorageLevel ) Hack for now. > val drmABigger1 = drmA1.addToRowCardinality(1) > > val drmABiggerPlusOne1 = drmABigger1.plus(1.0) // drmABigger has no > row 2 in the rdd but an empty row 1 > // drmABiggerPlusOne1 is a dense matrix > println(drmABiggerPlusOne1) > > val drmA2 = drmParallelize(m = a, numPartitions = 2) > val drmABigger2 = drmA2.addToRowCardinality(1) > val drmABiggerPlusOne2 = drmABigger2 + 1.0 > drmABiggerPlusOne2.writeDRM("tmp/plus-one/drma-bigger-plus-one-ops/") > > > val bp = 0 > } > > method #1 works, #2 doesn’t. Even when I create a new CheckpintedDrmSpark > with larger _nrow than is in the data—even without the addToRowCardinality. > In Method #1, plus() is not even operating on the DRM. The plus() is operating on in core Matrix which is implicitly collect()ed because of the drm2InCore implicit type converter. So drmABiggerPlusOne1 is neither a DrmLike nor CheckpointedDrm, but is actually just an incore Matrix. You will have to drmParallelize() it again in order to do any distributed operations. I agree that in some cases it doesn’t can you please allow me to ask if it > _should_? > I don't think it is working in any case. The implicit converter is making it feel like it is working. Thanks > On Jul 19, 2014, at 8:56 PM, Anand Avati <[email protected]> wrote: > > > > > On Sat, Jul 19, 2014 at 6:50 PM, Pat Ferrel <[email protected]> wrote: > >> >> On another thread I’ll send you code that shows A + 1 works with blank >> rows in A. >> > > I don't see how that worked for you. See this: > > test("DRM addToRowCardinality - will fail") { > val inCoreA = sparse( > 0 -> 1 :: 1 -> 2 :: Nil, > 0 -> 3 :: 1 -> 4 :: Nil, > 0 -> 2 :: 1 -> 0.0 :: Nil > ) > > val inCoreBControl = sparse( > 0 -> 2 :: 1 -> 3 :: Nil, > 0 -> 4 :: 1 -> 5 :: Nil, > 0 -> 3 :: 1 -> 1 :: Nil, > 0 -> 1 :: 1 -> 1 :: Nil, > 0 -> 1 :: 1 -> 1 :: Nil > ) > > val drmA = drmParallelize(inCoreA) > drmA.addToRowCardinality(2) > val drmB = (drmA + 1.0).checkpoint() > > (drmB.collect - inCoreBControl).norm should be < 1e-3 > > } > > test("DRM addToRowCardinality - wont fail") { > val inCoreA = sparse( > 0 -> 1 :: 1 -> 2 :: Nil, > 0 -> 3 :: 1 -> 4 :: Nil, > 0 -> 2 :: 1 -> 0.0 :: Nil > ) > > val inCoreBWrong = sparse( > 0 -> 2 :: 1 -> 3 :: Nil, > 0 -> 4 :: 1 -> 5 :: Nil, > 0 -> 3 :: 1 -> 1 :: Nil, > 0 -> 0 :: 1 -> 0 :: Nil, > 0 -> 0 :: 1 -> 0 :: Nil > ) > > val drmA = drmParallelize(inCoreA) > drmA.addToRowCardinality(2) > val drmB = (drmA + 1.0).checkpoint() > > (drmB.collect - inCoreBWrong).norm should be < 1e-3 > } > > > And sure enough, inCoreBControl fails, and inCoreBWrong succeeds: > > - DRM addToRowCardinality - will fail *** FAILED *** > > 2.0 was not less than 0.001 (DrmLikeSuiteBase.scala:116) > - DRM addToRowCardinality - wont fail > > > BTW this implies rbind will not solve the problem, it is firmly in data >> prep. But until I know the rules I won’t know how to do the right thing. >> > > Rbind expects both A and B to have their Int row keys filled from 0 to > nrow-1, which is how they should be ideally. > > >
