On Mon, Jul 14, 2014 at 11:56 AM, Pat Ferrel <[email protected]> wrote:
> In the application, the number of rows will always be increased, adding > blank rows. I don’t think shuffle is necessary in this case because there > is no actual row, no data in the drm it’s just needed to make the > cardinality match, the IDs will take care of data matching . Maybe calling > it something else is a good idea to emphasize the special case for it’s > use. I went over this with Dmitriy and, though I haven’t checked actual > values on large datasets, it works. > Does that mean the cardinality is faked at the logical layer with no changes at the engine level? Does that means the physical operators need to be prepared to handle non-matching matrix multiplication by assuming the missing rows or columns are 0's? Does that really work with no changes? This sounds like a need to introduce a new R-like rbind() operator. This way you could fix up row cardinality like: drmAnew = drmA rbind drmParallelizeEmpty(extra_rows, drmA.ncol) You could already do this, though twisted:: drmAnew = (drmA.t cbind drmParallelizeEmpty(drmA.ncol, extra_rows).t
