On Wed, Jul 16, 2014 at 7:53 AM, Pat Ferrel <[email protected]> wrote:
> There IS no issue with nrow being a lazy val. I never touch it read below. > The value itself is may not be immutable. But it sounds like the same matrix would return different values for nrow() depending on when you called it. That sounds very much like a problem if the same matrix is part of two separate spark graph hierarchies where each is making a different assumption about its cardinality. > creating a new matrix val is fine if it doesn’t cause a new rdd to be > created I’ll look into that. > > rbind as I read it requires me to construct the rows to be added. I don’t > know what their keys are and don’t want to calculate them. If I’m right > about how the math works the actual rows are not needed. See the example code to add empty rows. Numerical keys are auto computed based on the sizes. > This looks like a much heavier weight operation than just changing the row > cardinality and works for other cases where you are adding real vectors. > I'm not sure if there is a lighter _and_ safe operation. Happy to take suggestions. Thanks On Tue, Jul 15, 2014 at 1:04 PM, Anand Avati <[email protected]> wrote: > > > > On Tue, Jul 15, 2014 at 12:45 PM, Pat Ferrel <[email protected]> wrote: > >> I appreciate the thoughts. >> >> I don’t change nrow it is still a lazy val. I change _nrow, which is a >> var and is used to calculate nrow when it is needed. The only thing run on >> them is the CheckpointedDrmSpark constructor. The class exists to guarantee >> the drm is pinned down and _nrow is changed after construction but before >> any math is done on it. Changing _nrow may be safe on a >> CheckpointedDrmSpark but the question is why I’ll put it up on a PR. >> >> btw I was thinking of calling the method >> CheckpointedDrmSpark#addEmptyRows, which since it’s sparse will just change >> _nrow and will flag the purpose of the method not to mention it avoids the >> question about reducing the number of rows. > > > > I would prefer a new rbind() operator instead of addEmptyRows() method. > Just feels more consistent. > > Thanks >
