On Tue, Jul 15, 2014 at 11:10 AM, Ted Dunning <[email protected]> wrote:

>
> My worry about this is that I thought that DSL objects needed to remain
> immutable due to the lazy evaluation.
>
> For instance, suppose that A and B can be multiplied because they are the
> right shapes:
>
>      C = A.t %*% B
>
> Now I change number of rows in A
>
>      A.nrow = 2 * A.nrow
>
> and A now has the shape compatible with D
>
>      E = A.t %*% D
>
> And now I ask for some materialization:
>
>      x = C.sum + E.sum
>
> And this fails.  The reason is that the change in A's shape happened
> before computation was started on multiplying A and B.
>
> Now, to my mind, lazy evaluation is pretty crucial because it allows the
> optimizer to work.  That seems to me to say that mutation should not be
> allowed.
>
> (and yes, I know that these aren't the correct notation ... that isn't the
> point)
>


This is a good enough reason to avoid mutating nrow field. This apart, I'm
still not convinced the spark engine can "just work" without adding new
(key, vector) tuples. It seems it is bound to fail in some future operation
if the (key, vector) tuples (even if sparse/empty) are not physically added.

Thanks

Reply via email to