ok. so it should be easy to fix at least everything but elementwise scalar i guess.
Since the notion of "missing rows" is only defined for int-keyed datasets, then ew scalar technically should work for non-int keyed datasets already. as for int-keyed datasets, i am not sure what is the best strategy. Obviously, one can define sort of normalization/validation of int-keyed dataset routine, but it would be fairly expensive to run "just because". Perhaps there's a cheap test (as cheap as row count job) to run to test for int keys consistency when matrix is first created. On Mon, Jul 21, 2014 at 3:12 PM, Anand Avati <[email protected]> wrote: > > > > On Mon, Jul 21, 2014 at 3:08 PM, Dmitriy Lyubimov <[email protected]> > wrote: > >> >> >> >> On Mon, Jul 21, 2014 at 3:06 PM, Anand Avati <[email protected]> wrote: >> >>> Dmitriy, comments inline - >>> >>> On Jul 21, 2014, at 1:12 PM, Dmitriy Lyubimov <[email protected]> >>> wrote: >>> >>>> And no, i suppose it is ok to have "missing" rows even in case of >>>> int-keyed matrices. >>>> >>>> there's one thing that you probably should be aware in this context >>>> though: many algorithms don't survive empty (row-less) partitions, in >>>> whatever way they may come to be. Other than that, I don't feel every row >>>> must be present -- even if there's implied order of the rows. >>>> >>> >>> I'm not sure if that is necessarily true. There are three operators >>> which break pretty badly with with missing rows. >>> >>> AewScalar - operation like A + 1 is just not applied on the missing row, >>> so the final matrix will have 0's in place of 1s. >>> >> >> Indeed. i have no recourse at this point. >> >> >>> >>> AewB, CbindAB - function after cogroup() throws exception if a row was >>> present on only one matrix. So I guess it is OK to have missing rows as >>> long as both A and B have the exact same missing row set. Somewhat >>> quirky/nuanced requirement. >>> >> >> Agree. i actually was not aware that's a cogroup() semantics in spark. I >> though it would have an outer join semantics (as in Pig, i believe). Alas, >> no recourse at this point either. >> > > The exception is actually during reduceLeft after cogroup(). Cogroup() > itself is probably an outer-join. > > >
