ah. got it. it is java api via implicit conversion. gosh. no good.
On Mon, Jul 21, 2014 at 3:44 PM, Dmitriy Lyubimov <[email protected]> wrote: > not sure what is this plus() thing. Is it something that is not yet > committed? > > > On Mon, Jul 21, 2014 at 3:41 PM, Anand Avati <[email protected]> wrote: > >> On Mon, Jul 21, 2014 at 3:35 PM, Pat Ferrel <[email protected]> >> wrote: >> >> > If you do drm.plus(1) this converts to a dense matrix, which is what the >> > result must be anyway, and does add the scalar to all rows, even missing >> > ones. >> > >> > >> Pat, I mentioned this in my previous email already. drm.plus(1) completely >> misses the point. It converts DRM into an in-core matrix and applies >> plus() >> method on Matrix. The result is a Matrix, not DRM. >> >> drm.plus(1) is EXACTLY the same as: >> >> Matrix m = drm.collect() >> m.plus(1) >> >> The implicit def drm2InCore() syntactic sugar is probably turning out to >> be >> dangerous in this case, in terms of hinting the wrong meaning. >> >> Thanks >> >> >> >> >> >> > On Jul 21, 2014, at 3:23 PM, Dmitriy Lyubimov <[email protected]> >> wrote: >> > >> > perhaps just compare row count with max(key)? that's exactly what lazy >> > nrow() currently does in this case. >> > >> > >> > On Mon, Jul 21, 2014 at 3:21 PM, Dmitriy Lyubimov <[email protected]> >> > wrote: >> > >> > > >> > > ok. so it should be easy to fix at least everything but elementwise >> > scalar >> > > i guess. >> > > >> > > Since the notion of "missing rows" is only defined for int-keyed >> > datasets, >> > > then ew scalar technically should work for non-int keyed datasets >> > already. >> > > >> > > as for int-keyed datasets, i am not sure what is the best strategy. >> > > Obviously, one can define sort of normalization/validation of >> int-keyed >> > > dataset routine, but it would be fairly expensive to run "just >> because". >> > > Perhaps there's a cheap test (as cheap as row count job) to run to >> test >> > for >> > > int keys consistency when matrix is first created. >> > > >> > > >> > > >> > > On Mon, Jul 21, 2014 at 3:12 PM, Anand Avati <[email protected]> >> wrote: >> > > >> > >> >> > >> >> > >> >> > >> On Mon, Jul 21, 2014 at 3:08 PM, Dmitriy Lyubimov <[email protected] >> > >> > >> wrote: >> > >> >> > >>> >> > >>> >> > >>> >> > >>> On Mon, Jul 21, 2014 at 3:06 PM, Anand Avati <[email protected]> >> > wrote: >> > >>> >> > >>>> Dmitriy, comments inline - >> > >>>> >> > >>>> On Jul 21, 2014, at 1:12 PM, Dmitriy Lyubimov <[email protected]> >> > >>>> wrote: >> > >>>> >> > >>>>> And no, i suppose it is ok to have "missing" rows even in case of >> > >>>>> int-keyed matrices. >> > >>>>> >> > >>>>> there's one thing that you probably should be aware in this >> context >> > >>>>> though: many algorithms don't survive empty (row-less) >> partitions, in >> > >>>>> whatever way they may come to be. Other than that, I don't feel >> > every row >> > >>>>> must be present -- even if there's implied order of the rows. >> > >>>>> >> > >>>> >> > >>>> I'm not sure if that is necessarily true. There are three operators >> > >>>> which break pretty badly with with missing rows. >> > >>>> >> > >>>> AewScalar - operation like A + 1 is just not applied on the missing >> > >>>> row, so the final matrix will have 0's in place of 1s. >> > >>>> >> > >>> >> > >>> Indeed. i have no recourse at this point. >> > >>> >> > >>> >> > >>>> >> > >>>> AewB, CbindAB - function after cogroup() throws exception if a row >> was >> > >>>> present on only one matrix. So I guess it is OK to have missing >> rows >> > as >> > >>>> long as both A and B have the exact same missing row set. Somewhat >> > >>>> quirky/nuanced requirement. >> > >>>> >> > >>> >> > >>> Agree. i actually was not aware that's a cogroup() semantics in >> spark. >> > I >> > >>> though it would have an outer join semantics (as in Pig, i believe). >> > Alas, >> > >>> no recourse at this point either. >> > >>> >> > >> >> > >> The exception is actually during reduceLeft after cogroup(). >> Cogroup() >> > >> itself is probably an outer-join. >> > >> >> > >> >> > >> >> > > >> > >> > >> > >
