There IS no issue with nrow being a lazy val. I never touch it read below. creating a new matrix val is fine if it doesn’t cause a new rdd to be created I’ll look into that.
rbind as I read it requires me to construct the rows to be added. I don’t know what their keys are and don’t want to calculate them. If I’m right about how the math works the actual rows are not needed. This looks like a much heavier weight operation than just changing the row cardinality and works for other cases where you are adding real vectors. I’ll look deeper now that cross-cooccurrence seems to be fixed. On Jul 15, 2014, at 7:40 PM, Ted Dunning <[email protected]> wrote: The rbind approach also gives a new object and avoids all questions of lazy evaluation. On Tue, Jul 15, 2014 at 1:04 PM, Anand Avati <[email protected]> wrote: > > > > On Tue, Jul 15, 2014 at 12:45 PM, Pat Ferrel <[email protected]> wrote: > >> I appreciate the thoughts. >> >> I don’t change nrow it is still a lazy val. I change _nrow, which is a >> var and is used to calculate nrow when it is needed. The only thing run on >> them is the CheckpointedDrmSpark constructor. The class exists to guarantee >> the drm is pinned down and _nrow is changed after construction but before >> any math is done on it. Changing _nrow may be safe on a >> CheckpointedDrmSpark but the question is why I’ll put it up on a PR. >> >> btw I was thinking of calling the method >> CheckpointedDrmSpark#addEmptyRows, which since it’s sparse will just change >> _nrow and will flag the purpose of the method not to mention it avoids the >> question about reducing the number of rows. > > > > I would prefer a new rbind() operator instead of addEmptyRows() method. > Just feels more consistent. > > Thanks >
