src/main/java/org/apache/mahout/math/AbstractMatrix.java:444: public Matrix plus(double x)
which ends up seeming to "work" on a DRM because of math-scala/src/main/scala/org/apache/mahout/math/drm/package.scala: implicit def drm2InCore[K: ClassTag](drm: DrmLike[K]): Matrix = drm.collect On Mon, Jul 21, 2014 at 3:44 PM, Dmitriy Lyubimov <[email protected]> wrote: > not sure what is this plus() thing. Is it something that is not yet > committed? > > > On Mon, Jul 21, 2014 at 3:41 PM, Anand Avati <[email protected]> wrote: > > > On Mon, Jul 21, 2014 at 3:35 PM, Pat Ferrel <[email protected]> > wrote: > > > > > If you do drm.plus(1) this converts to a dense matrix, which is what > the > > > result must be anyway, and does add the scalar to all rows, even > missing > > > ones. > > > > > > > > Pat, I mentioned this in my previous email already. drm.plus(1) > completely > > misses the point. It converts DRM into an in-core matrix and applies > plus() > > method on Matrix. The result is a Matrix, not DRM. > > > > drm.plus(1) is EXACTLY the same as: > > > > Matrix m = drm.collect() > > m.plus(1) > > > > The implicit def drm2InCore() syntactic sugar is probably turning out to > be > > dangerous in this case, in terms of hinting the wrong meaning. > > > > Thanks > > > > > > > > > > > > > On Jul 21, 2014, at 3:23 PM, Dmitriy Lyubimov <[email protected]> > > wrote: > > > > > > perhaps just compare row count with max(key)? that's exactly what lazy > > > nrow() currently does in this case. > > > > > > > > > On Mon, Jul 21, 2014 at 3:21 PM, Dmitriy Lyubimov <[email protected]> > > > wrote: > > > > > > > > > > > ok. so it should be easy to fix at least everything but elementwise > > > scalar > > > > i guess. > > > > > > > > Since the notion of "missing rows" is only defined for int-keyed > > > datasets, > > > > then ew scalar technically should work for non-int keyed datasets > > > already. > > > > > > > > as for int-keyed datasets, i am not sure what is the best strategy. > > > > Obviously, one can define sort of normalization/validation of > int-keyed > > > > dataset routine, but it would be fairly expensive to run "just > > because". > > > > Perhaps there's a cheap test (as cheap as row count job) to run to > test > > > for > > > > int keys consistency when matrix is first created. > > > > > > > > > > > > > > > > On Mon, Jul 21, 2014 at 3:12 PM, Anand Avati <[email protected]> > > wrote: > > > > > > > >> > > > >> > > > >> > > > >> On Mon, Jul 21, 2014 at 3:08 PM, Dmitriy Lyubimov < > [email protected]> > > > >> wrote: > > > >> > > > >>> > > > >>> > > > >>> > > > >>> On Mon, Jul 21, 2014 at 3:06 PM, Anand Avati <[email protected]> > > > wrote: > > > >>> > > > >>>> Dmitriy, comments inline - > > > >>>> > > > >>>> On Jul 21, 2014, at 1:12 PM, Dmitriy Lyubimov <[email protected]> > > > >>>> wrote: > > > >>>> > > > >>>>> And no, i suppose it is ok to have "missing" rows even in case of > > > >>>>> int-keyed matrices. > > > >>>>> > > > >>>>> there's one thing that you probably should be aware in this > context > > > >>>>> though: many algorithms don't survive empty (row-less) > partitions, > > in > > > >>>>> whatever way they may come to be. Other than that, I don't feel > > > every row > > > >>>>> must be present -- even if there's implied order of the rows. > > > >>>>> > > > >>>> > > > >>>> I'm not sure if that is necessarily true. There are three > operators > > > >>>> which break pretty badly with with missing rows. > > > >>>> > > > >>>> AewScalar - operation like A + 1 is just not applied on the > missing > > > >>>> row, so the final matrix will have 0's in place of 1s. > > > >>>> > > > >>> > > > >>> Indeed. i have no recourse at this point. > > > >>> > > > >>> > > > >>>> > > > >>>> AewB, CbindAB - function after cogroup() throws exception if a row > > was > > > >>>> present on only one matrix. So I guess it is OK to have missing > rows > > > as > > > >>>> long as both A and B have the exact same missing row set. Somewhat > > > >>>> quirky/nuanced requirement. > > > >>>> > > > >>> > > > >>> Agree. i actually was not aware that's a cogroup() semantics in > > spark. > > > I > > > >>> though it would have an outer join semantics (as in Pig, i > believe). > > > Alas, > > > >>> no recourse at this point either. > > > >>> > > > >> > > > >> The exception is actually during reduceLeft after cogroup(). > Cogroup() > > > >> itself is probably an outer-join. > > > >> > > > >> > > > >> > > > > > > > > > > > > >
