Re: Problem of dimensions

Pat Ferrel Tue, 15 Jul 2014 12:50:40 -0700

BTW I’m now wondering about a non-existant row too. The question was put to 
Dmitriy a couple replies down. It worked on one test but now I’m bogged down 
fixing cross-coocurrence tests and won’t get back to itemsimilarity to try it 
again until later.

On Jul 15, 2014, at 12:45 PM, Pat Ferrel <[email protected]> wrote:

I appreciate the thoughts.

I don’t change nrow it is still a lazy val. I change _nrow, which is a var and 
is used to calculate nrow when it is needed. The only thing run on them is the 
CheckpointedDrmSpark constructor. The class exists to guarantee the drm is 
pinned down and _nrow is changed after construction but before any math is done 
on it. Changing _nrow may be safe on a CheckpointedDrmSpark but the question is 
why I’ll put it up on a PR.

btw I was thinking of calling the method CheckpointedDrmSpark#addEmptyRows, 
which since it’s sparse will just change _nrow and will flag the purpose of the 
method not to mention it avoids the question about reducing the number of rows.

On Jul 15, 2014, at 12:11 PM, Anand Avati <[email protected]> wrote:

On Tue, Jul 15, 2014 at 11:10 AM, Ted Dunning <[email protected]> wrote:

> 
> My worry about this is that I thought that DSL objects needed to remain
> immutable due to the lazy evaluation.
> 
> For instance, suppose that A and B can be multiplied because they are the
> right shapes:
> 
>    C = A.t %*% B
> 
> Now I change number of rows in A
> 
>    A.nrow = 2 * A.nrow
> 
> and A now has the shape compatible with D
> 
>    E = A.t %*% D
> 
> And now I ask for some materialization:
> 
>    x = C.sum + E.sum
> 
> And this fails.  The reason is that the change in A's shape happened
> before computation was started on multiplying A and B.
> 
> Now, to my mind, lazy evaluation is pretty crucial because it allows the
> optimizer to work.  That seems to me to say that mutation should not be
> allowed.
> 
> (and yes, I know that these aren't the correct notation ... that isn't the
> point)
> 

This is a good enough reason to avoid mutating nrow field. This apart, I'm
still not convinced the spark engine can "just work" without adding new
(key, vector) tuples. It seems it is bound to fail in some future operation
if the (key, vector) tuples (even if sparse/empty) are not physically added.

Thanks

Re: Problem of dimensions

Reply via email to