Re: Problem of dimensions

Dmitriy Lyubimov Mon, 21 Jul 2014 15:48:12 -0700

ah. got it. it is java api via implicit conversion. gosh. no good.


On Mon, Jul 21, 2014 at 3:44 PM, Dmitriy Lyubimov <[email protected]> wrote:

> not sure what is this plus() thing. Is it something that is not yet
> committed?
>
>
> On Mon, Jul 21, 2014 at 3:41 PM, Anand Avati <[email protected]> wrote:
>
>> On Mon, Jul 21, 2014 at 3:35 PM, Pat Ferrel <[email protected]>
>> wrote:
>>
>> > If you do drm.plus(1) this converts to a dense matrix, which is what the
>> > result must be anyway, and does add the scalar to all rows, even missing
>> > ones.
>> >
>> >
>> Pat, I mentioned this in my previous email already. drm.plus(1) completely
>> misses the point. It converts DRM into an in-core matrix and applies
>> plus()
>> method on Matrix. The result is a Matrix, not DRM.
>>
>> drm.plus(1) is EXACTLY the same as:
>>
>> Matrix m = drm.collect()
>> m.plus(1)
>>
>> The implicit def drm2InCore() syntactic sugar is probably turning out to
>> be
>> dangerous in this case, in terms of hinting the wrong meaning.
>>
>> Thanks
>>
>>
>>
>>
>>
>> >  On Jul 21, 2014, at 3:23 PM, Dmitriy Lyubimov <[email protected]>
>> wrote:
>> >
>> > perhaps just compare row count with max(key)? that's exactly what lazy
>> > nrow() currently does in this case.
>> >
>> >
>> > On Mon, Jul 21, 2014 at 3:21 PM, Dmitriy Lyubimov <[email protected]>
>> > wrote:
>> >
>> > >
>> > > ok. so it should be easy to fix at least everything but elementwise
>> > scalar
>> > > i guess.
>> > >
>> > > Since the notion of "missing rows" is only defined for int-keyed
>> > datasets,
>> > > then ew scalar technically should work for non-int keyed datasets
>> > already.
>> > >
>> > > as for int-keyed datasets, i am not sure what is the best strategy.
>> > > Obviously, one can define sort of normalization/validation of
>> int-keyed
>> > > dataset routine, but it would be fairly expensive to run "just
>> because".
>> > > Perhaps there's a cheap test (as cheap as row count job) to run to
>> test
>> > for
>> > > int keys consistency when matrix is first created.
>> > >
>> > >
>> > >
>> > > On Mon, Jul 21, 2014 at 3:12 PM, Anand Avati <[email protected]>
>> wrote:
>> > >
>> > >>
>> > >>
>> > >>
>> > >> On Mon, Jul 21, 2014 at 3:08 PM, Dmitriy Lyubimov <[email protected]
>> >
>> > >> wrote:
>> > >>
>> > >>>
>> > >>>
>> > >>>
>> > >>> On Mon, Jul 21, 2014 at 3:06 PM, Anand Avati <[email protected]>
>> > wrote:
>> > >>>
>> > >>>> Dmitriy, comments inline -
>> > >>>>
>> > >>>> On Jul 21, 2014, at 1:12 PM, Dmitriy Lyubimov <[email protected]>
>> > >>>> wrote:
>> > >>>>
>> > >>>>> And no, i suppose it is ok to have "missing" rows even in case of
>> > >>>>> int-keyed matrices.
>> > >>>>>
>> > >>>>> there's one thing that you probably should be aware in this
>> context
>> > >>>>> though: many algorithms don't survive empty (row-less)
>> partitions, in
>> > >>>>> whatever way they may come to be. Other than that, I don't feel
>> > every row
>> > >>>>> must be present -- even if there's implied order of the rows.
>> > >>>>>
>> > >>>>
>> > >>>> I'm not sure if that is necessarily true. There are three operators
>> > >>>> which break pretty badly with with missing rows.
>> > >>>>
>> > >>>> AewScalar - operation like A + 1 is just not applied on the missing
>> > >>>> row, so the final matrix will have 0's in place of 1s.
>> > >>>>
>> > >>>
>> > >>> Indeed. i have no recourse at this point.
>> > >>>
>> > >>>
>> > >>>>
>> > >>>> AewB, CbindAB - function after cogroup() throws exception if a row
>> was
>> > >>>> present on only one matrix. So I guess it is OK to have missing
>> rows
>> > as
>> > >>>> long as both A and B have the exact same missing row set. Somewhat
>> > >>>> quirky/nuanced requirement.
>> > >>>>
>> > >>>
>> > >>> Agree. i actually was not aware that's a cogroup() semantics in
>> spark.
>> > I
>> > >>> though it would have an outer join semantics (as in Pig, i believe).
>> > Alas,
>> > >>> no recourse at this point either.
>> > >>>
>> > >>
>> > >> The exception is actually during reduceLeft after cogroup().
>> Cogroup()
>> > >> itself is probably an outer-join.
>> > >>
>> > >>
>> > >>
>> > >
>> >
>> >
>>
>
>

Re: Problem of dimensions

Reply via email to