On Mon, Jul 14, 2014 at 10:58 AM, Ted Dunning <[email protected]> wrote:
> On Mon, Jul 14, 2014 at 9:47 AM, Pat Ferrel <[email protected]> wrote: > > > BTW that requires that drm.nrow be mutable. That is defined as immutable > > in the DSL and so will require a change to several traits. I’ve done this > > but am still trying to decide the cleanest. > > > Hmmm.... immutability has lots of virtues. And changing nrows is just the > tip of the iceberg. You also have to shuffle the rows to match the row > partitioning between the two matrices. > > Or it requires more than one pass through the data. Since you have to read > both matrices before you can deal with either, and since one matrix is > likely to be shuffled relative to the other, might it just be better to > either do two read passes or pay the cost to shuffle the matrices after > getting a consensus view. Note that the second read pass will have to do a > shuffle any way so the only savings to doing two passes is to decrease > memory usage. > > *Anand,* > > I think I remember you were addressing a shuffle problem in some of your > earlier work. What did you conclude? > I think the larger question is, what does it mean to make drm.nrow mutable. If changed to a smaller value, which rows do you "sacrifice". Why not just do a RowRange operation to get a new DRM with fewer rows (instead of mutating the given drm)? After that, if you care specifically about partitioning the Par operator can shuffle data for you.
