Re: A + 1

Pat Ferrel Sun, 20 Jul 2014 08:17:23 -0700

Sorry but my question keep getting lost in the back and forth, no one reads 
that much email.


Oh, it’s working. The conversion to dense is saving the day in #1 I think. 
Where it is using nrow to create enough rows rather than looking in the rdd for 
row keys—just a guess.

I think it was you that said dense or sparse the math should produce the same 
result. One or the other is a bug, right?

I am going to add some matrix multiply tests that work on rdd backed objects. 
The current tests could use some additions. I suspect that multiply and 
transpose will work correctly with non-existant rows/columns but still need the 
designers to come out on one side or the other. 

If row keys need to be sequential and unbroken, this is s big deal and new to 
me.


On Jul 19, 2014, at 10:09 PM, Anand Avati <[email protected]> wrote:

On Sat, Jul 19, 2014 at 9:06 PM, Pat Ferrel <[email protected]> wrote:

> Using methods instead of symbolic ops returns different types so methods
> work, ops don’t. If math is math, they should do the smae thing so I’d like
> to know what it is supposed to do, can you please allow me to ask specific
> people the question?
> 


I don't see how I'm coming in the way of anybody else from replying. By
posting to a public mailing list, you by definition allow anybody to review
and comment.


 test("plus one"){
>    val a = dense(
>      (1, 1),
>      (0, 0))
> 
>    val drmA1 = drmParallelize(m = a, numPartitions = 2)
> 
>    // modified to return a new CheckpointedDrm so maintains immutability
> but still only increases the row cardinality
>    // by returning new CheckpointedDrmSpark[K](rdd, nrow + n, ncol,
> _cacheStorageLevel ) Hack for now.
>    val drmABigger1 = drmA1.addToRowCardinality(1)
> 
>    val drmABiggerPlusOne1 = drmABigger1.plus(1.0)  // drmABigger has no
> row 2 in the rdd but an empty row 1
>    // drmABiggerPlusOne1 is a dense matrix
>    println(drmABiggerPlusOne1)
> 
>    val drmA2 = drmParallelize(m = a, numPartitions = 2)
>    val drmABigger2 = drmA2.addToRowCardinality(1)
>    val drmABiggerPlusOne2 = drmABigger2 + 1.0
>    drmABiggerPlusOne2.writeDRM("tmp/plus-one/drma-bigger-plus-one-ops/")
> 
> 
>    val bp = 0
>  }
> 
> method #1 works, #2 doesn’t. Even when I create a new CheckpintedDrmSpark
> with larger _nrow than is in the data—even without the addToRowCardinality.
> 

In Method #1, plus() is not even operating on the DRM. The plus() is
operating on in core Matrix which is implicitly collect()ed because of the
drm2InCore implicit type converter. So drmABiggerPlusOne1 is neither a
DrmLike nor CheckpointedDrm, but is actually just an incore Matrix. You
will have to drmParallelize() it again in order to do any distributed
operations.

I agree that in some cases it doesn’t can you please allow me to ask if it
> _should_?
> 

I don't think it is working in any case. The implicit converter is making
it feel like it is working.

Thanks




> On Jul 19, 2014, at 8:56 PM, Anand Avati <[email protected]> wrote:
> 
> 
> 
> 
> On Sat, Jul 19, 2014 at 6:50 PM, Pat Ferrel <[email protected]> wrote:
> 
>> 
>> On another thread I’ll send you code that shows A + 1 works with blank
>> rows in A.
>> 
> 
> I don't see how that worked for you. See this:
> 
>  test("DRM addToRowCardinality - will fail") {
>    val inCoreA = sparse(
>      0 -> 1 :: 1 -> 2 :: Nil,
>      0 -> 3 :: 1 -> 4 :: Nil,
>      0 -> 2 :: 1 -> 0.0 :: Nil
>    )
> 
>    val inCoreBControl = sparse(
>      0 -> 2 :: 1 -> 3 :: Nil,
>      0 -> 4 :: 1 -> 5 :: Nil,
>      0 -> 3 :: 1 -> 1 :: Nil,
>      0 -> 1 :: 1 -> 1 :: Nil,
>      0 -> 1 :: 1 -> 1 :: Nil
>    )
> 
>    val drmA = drmParallelize(inCoreA)
>    drmA.addToRowCardinality(2)
>    val drmB = (drmA + 1.0).checkpoint()
> 
>    (drmB.collect - inCoreBControl).norm should be < 1e-3
> 
>  }
> 
>  test("DRM addToRowCardinality - wont fail") {
>    val inCoreA = sparse(
>      0 -> 1 :: 1 -> 2 :: Nil,
>      0 -> 3 :: 1 -> 4 :: Nil,
>      0 -> 2 :: 1 -> 0.0 :: Nil
>    )
> 
>    val inCoreBWrong = sparse(
>      0 -> 2 :: 1 -> 3 :: Nil,
>      0 -> 4 :: 1 -> 5 :: Nil,
>      0 -> 3 :: 1 -> 1 :: Nil,
>      0 -> 0 :: 1 -> 0 :: Nil,
>      0 -> 0 :: 1 -> 0 :: Nil
>    )
> 
>    val drmA = drmParallelize(inCoreA)
>    drmA.addToRowCardinality(2)
>    val drmB = (drmA + 1.0).checkpoint()
> 
>    (drmB.collect - inCoreBWrong).norm should be < 1e-3
>  }
> 
> 
> And sure enough, inCoreBControl fails, and inCoreBWrong succeeds:
> 
> - DRM addToRowCardinality - will fail *** FAILED ***
> 
>  2.0 was not less than 0.001 (DrmLikeSuiteBase.scala:116)
> - DRM addToRowCardinality - wont fail
> 
> 
> BTW this implies rbind will not solve the problem, it is firmly in data
>> prep. But until I know the rules I won’t know how to do the right thing.
>> 
> 
> Rbind expects both A and B to have their Int row keys filled from 0 to
> nrow-1, which is how they should be ideally.
> 
> 
>

Re: A + 1

Reply via email to