On Mon, Jul 21, 2014 at 3:06 PM, Anand Avati <[email protected]> wrote:

> Dmitriy, comments inline -
>
>  On Jul 21, 2014, at 1:12 PM, Dmitriy Lyubimov <[email protected]> wrote:
>
>> And no, i suppose it is ok to have "missing" rows even in case of
>> int-keyed matrices.
>>
>> there's one thing that you probably should be aware in this context
>> though: many algorithms don't survive empty (row-less) partitions, in
>> whatever way they may come to be. Other than that, I don't feel every row
>> must be present -- even if there's implied order of the rows.
>>
>
> I'm not sure if that is necessarily true. There are three operators which
> break pretty badly with with missing rows.
>
> AewScalar - operation like A + 1 is just not applied on the missing row,
> so the final matrix will have 0's in place of 1s.
>

Indeed. i have no recourse at this point.


>
> AewB, CbindAB - function after cogroup() throws exception if a row was
> present on only one matrix. So I guess it is OK to have missing rows as
> long as both A and B have the exact same missing row set. Somewhat
> quirky/nuanced requirement.
>

Agree. i actually was not aware that's a cogroup() semantics in spark. I
though it would have an outer join semantics (as in Pig, i believe). Alas,
no recourse at this point either.


>
> These issues are other than the empty partition problem. So, if we were to
> fix the above issues (I don't see a simple way to), I guess we could say
> "it is ok to have missing rows even in case of int-keyed matrices." Given
> the state of things, I think it is safer to change the stance. Besides,
> what is the benefit/advantage of "supporting" missing rows, it is a
> physical implementation detail after all.
>
> Thanks
>

Reply via email to