Re: Problem of dimensions

Pat Ferrel Mon, 21 Jul 2014 15:36:18 -0700

If you do drm.plus(1) this converts to a dense matrix, which is what the result 
must be anyway, and does add the scalar to all rows, even missing ones.


On Jul 21, 2014, at 3:23 PM, Dmitriy Lyubimov <[email protected]> wrote:

perhaps just compare row count with max(key)? that's exactly what lazy
nrow() currently does in this case.


On Mon, Jul 21, 2014 at 3:21 PM, Dmitriy Lyubimov <[email protected]> wrote:

> 
> ok. so it should be easy to fix at least everything but elementwise scalar
> i guess.
> 
> Since the notion of "missing rows" is only defined for int-keyed datasets,
> then ew scalar technically should work for non-int keyed datasets already.
> 
> as for int-keyed datasets, i am not sure what is the best strategy.
> Obviously, one can define sort of normalization/validation of int-keyed
> dataset routine, but it would be fairly expensive to run "just because".
> Perhaps there's a cheap test (as cheap as row count job) to run to test for
> int keys consistency when matrix is first created.
> 
> 
> 
> On Mon, Jul 21, 2014 at 3:12 PM, Anand Avati <[email protected]> wrote:
> 
>> 
>> 
>> 
>> On Mon, Jul 21, 2014 at 3:08 PM, Dmitriy Lyubimov <[email protected]>
>> wrote:
>> 
>>> 
>>> 
>>> 
>>> On Mon, Jul 21, 2014 at 3:06 PM, Anand Avati <[email protected]> wrote:
>>> 
>>>> Dmitriy, comments inline -
>>>> 
>>>> On Jul 21, 2014, at 1:12 PM, Dmitriy Lyubimov <[email protected]>
>>>> wrote:
>>>> 
>>>>> And no, i suppose it is ok to have "missing" rows even in case of
>>>>> int-keyed matrices.
>>>>> 
>>>>> there's one thing that you probably should be aware in this context
>>>>> though: many algorithms don't survive empty (row-less) partitions, in
>>>>> whatever way they may come to be. Other than that, I don't feel every row
>>>>> must be present -- even if there's implied order of the rows.
>>>>> 
>>>> 
>>>> I'm not sure if that is necessarily true. There are three operators
>>>> which break pretty badly with with missing rows.
>>>> 
>>>> AewScalar - operation like A + 1 is just not applied on the missing
>>>> row, so the final matrix will have 0's in place of 1s.
>>>> 
>>> 
>>> Indeed. i have no recourse at this point.
>>> 
>>> 
>>>> 
>>>> AewB, CbindAB - function after cogroup() throws exception if a row was
>>>> present on only one matrix. So I guess it is OK to have missing rows as
>>>> long as both A and B have the exact same missing row set. Somewhat
>>>> quirky/nuanced requirement.
>>>> 
>>> 
>>> Agree. i actually was not aware that's a cogroup() semantics in spark. I
>>> though it would have an outer join semantics (as in Pig, i believe). Alas,
>>> no recourse at this point either.
>>> 
>> 
>> The exception is actually during reduceLeft after cogroup(). Cogroup()
>> itself is probably an outer-join.
>> 
>> 
>> 
>

Re: Problem of dimensions

Reply via email to