On Mon, Jul 21, 2014 at 3:08 PM, Dmitriy Lyubimov <[email protected]> wrote:

>
>
>
> On Mon, Jul 21, 2014 at 3:06 PM, Anand Avati <[email protected]> wrote:
>
>> Dmitriy, comments inline -
>>
>>  On Jul 21, 2014, at 1:12 PM, Dmitriy Lyubimov <[email protected]> wrote:
>>
>>> And no, i suppose it is ok to have "missing" rows even in case of
>>> int-keyed matrices.
>>>
>>> there's one thing that you probably should be aware in this context
>>> though: many algorithms don't survive empty (row-less) partitions, in
>>> whatever way they may come to be. Other than that, I don't feel every row
>>> must be present -- even if there's implied order of the rows.
>>>
>>
>> I'm not sure if that is necessarily true. There are three operators which
>> break pretty badly with with missing rows.
>>
>> AewScalar - operation like A + 1 is just not applied on the missing row,
>> so the final matrix will have 0's in place of 1s.
>>
>
> Indeed. i have no recourse at this point.
>
>
>>
>> AewB, CbindAB - function after cogroup() throws exception if a row was
>> present on only one matrix. So I guess it is OK to have missing rows as
>> long as both A and B have the exact same missing row set. Somewhat
>> quirky/nuanced requirement.
>>
>
> Agree. i actually was not aware that's a cogroup() semantics in spark. I
> though it would have an outer join semantics (as in Pig, i believe). Alas,
> no recourse at this point either.
>

The exception is actually during reduceLeft after cogroup(). Cogroup()
itself is probably an outer-join.

Reply via email to