Sorry but my question keep getting lost in the back and forth, no one reads 
that much email.

Oh, it’s working. The conversion to dense is saving the day in #1 I think. 
Where it is using nrow to create enough rows rather than looking in the rdd for 
row keys—just a guess.

I think it was you that said dense or sparse the math should produce the same 
result. One or the other is a bug, right?

I am going to add some matrix multiply tests that work on rdd backed objects. 
The current tests could use some additions. I suspect that multiply and 
transpose will work correctly with non-existant rows/columns but still need the 
designers to come out on one side or the other. 

If row keys need to be sequential and unbroken, this is s big deal and new to 
me.


On Jul 19, 2014, at 10:09 PM, Anand Avati <[email protected]> wrote:

On Sat, Jul 19, 2014 at 9:06 PM, Pat Ferrel <[email protected]> wrote:

> Using methods instead of symbolic ops returns different types so methods
> work, ops don’t. If math is math, they should do the smae thing so I’d like
> to know what it is supposed to do, can you please allow me to ask specific
> people the question?
> 


I don't see how I'm coming in the way of anybody else from replying. By
posting to a public mailing list, you by definition allow anybody to review
and comment.


 test("plus one"){
>    val a = dense(
>      (1, 1),
>      (0, 0))
> 
>    val drmA1 = drmParallelize(m = a, numPartitions = 2)
> 
>    // modified to return a new CheckpointedDrm so maintains immutability
> but still only increases the row cardinality
>    // by returning new CheckpointedDrmSpark[K](rdd, nrow + n, ncol,
> _cacheStorageLevel ) Hack for now.
>    val drmABigger1 = drmA1.addToRowCardinality(1)
> 
>    val drmABiggerPlusOne1 = drmABigger1.plus(1.0)  // drmABigger has no
> row 2 in the rdd but an empty row 1
>    // drmABiggerPlusOne1 is a dense matrix
>    println(drmABiggerPlusOne1)
> 
>    val drmA2 = drmParallelize(m = a, numPartitions = 2)
>    val drmABigger2 = drmA2.addToRowCardinality(1)
>    val drmABiggerPlusOne2 = drmABigger2 + 1.0
>    drmABiggerPlusOne2.writeDRM("tmp/plus-one/drma-bigger-plus-one-ops/")
> 
> 
>    val bp = 0
>  }
> 
> method #1 works, #2 doesn’t. Even when I create a new CheckpintedDrmSpark
> with larger _nrow than is in the data—even without the addToRowCardinality.
> 

In Method #1, plus() is not even operating on the DRM. The plus() is
operating on in core Matrix which is implicitly collect()ed because of the
drm2InCore implicit type converter. So drmABiggerPlusOne1 is neither a
DrmLike nor CheckpointedDrm, but is actually just an incore Matrix. You
will have to drmParallelize() it again in order to do any distributed
operations.

I agree that in some cases it doesn’t can you please allow me to ask if it
> _should_?
> 

I don't think it is working in any case. The implicit converter is making
it feel like it is working.

Thanks




> On Jul 19, 2014, at 8:56 PM, Anand Avati <[email protected]> wrote:
> 
> 
> 
> 
> On Sat, Jul 19, 2014 at 6:50 PM, Pat Ferrel <[email protected]> wrote:
> 
>> 
>> On another thread I’ll send you code that shows A + 1 works with blank
>> rows in A.
>> 
> 
> I don't see how that worked for you. See this:
> 
>  test("DRM addToRowCardinality - will fail") {
>    val inCoreA = sparse(
>      0 -> 1 :: 1 -> 2 :: Nil,
>      0 -> 3 :: 1 -> 4 :: Nil,
>      0 -> 2 :: 1 -> 0.0 :: Nil
>    )
> 
>    val inCoreBControl = sparse(
>      0 -> 2 :: 1 -> 3 :: Nil,
>      0 -> 4 :: 1 -> 5 :: Nil,
>      0 -> 3 :: 1 -> 1 :: Nil,
>      0 -> 1 :: 1 -> 1 :: Nil,
>      0 -> 1 :: 1 -> 1 :: Nil
>    )
> 
>    val drmA = drmParallelize(inCoreA)
>    drmA.addToRowCardinality(2)
>    val drmB = (drmA + 1.0).checkpoint()
> 
>    (drmB.collect - inCoreBControl).norm should be < 1e-3
> 
>  }
> 
>  test("DRM addToRowCardinality - wont fail") {
>    val inCoreA = sparse(
>      0 -> 1 :: 1 -> 2 :: Nil,
>      0 -> 3 :: 1 -> 4 :: Nil,
>      0 -> 2 :: 1 -> 0.0 :: Nil
>    )
> 
>    val inCoreBWrong = sparse(
>      0 -> 2 :: 1 -> 3 :: Nil,
>      0 -> 4 :: 1 -> 5 :: Nil,
>      0 -> 3 :: 1 -> 1 :: Nil,
>      0 -> 0 :: 1 -> 0 :: Nil,
>      0 -> 0 :: 1 -> 0 :: Nil
>    )
> 
>    val drmA = drmParallelize(inCoreA)
>    drmA.addToRowCardinality(2)
>    val drmB = (drmA + 1.0).checkpoint()
> 
>    (drmB.collect - inCoreBWrong).norm should be < 1e-3
>  }
> 
> 
> And sure enough, inCoreBControl fails, and inCoreBWrong succeeds:
> 
> - DRM addToRowCardinality - will fail *** FAILED ***
> 
>  2.0 was not less than 0.001 (DrmLikeSuiteBase.scala:116)
> - DRM addToRowCardinality - wont fail
> 
> 
> BTW this implies rbind will not solve the problem, it is firmly in data
>> prep. But until I know the rules I won’t know how to do the right thing.
>> 
> 
> Rbind expects both A and B to have their Int row keys filled from 0 to
> nrow-1, which is how they should be ideally.
> 
> 
> 

Reply via email to