Re: Problem of dimensions

Anand Avati Mon, 21 Jul 2014 15:48:27 -0700

src/main/java/org/apache/mahout/math/AbstractMatrix.java:444:  public
Matrix plus(double x)


which ends up seeming to "work" on a DRM because of

math-scala/src/main/scala/org/apache/mahout/math/drm/package.scala:
 implicit def drm2InCore[K: ClassTag](drm: DrmLike[K]): Matrix = drm.collect





On Mon, Jul 21, 2014 at 3:44 PM, Dmitriy Lyubimov <[email protected]> wrote:

> not sure what is this plus() thing. Is it something that is not yet
> committed?
>
>
> On Mon, Jul 21, 2014 at 3:41 PM, Anand Avati <[email protected]> wrote:
>
> > On Mon, Jul 21, 2014 at 3:35 PM, Pat Ferrel <[email protected]>
> wrote:
> >
> > > If you do drm.plus(1) this converts to a dense matrix, which is what
> the
> > > result must be anyway, and does add the scalar to all rows, even
> missing
> > > ones.
> > >
> > >
> > Pat, I mentioned this in my previous email already. drm.plus(1)
> completely
> > misses the point. It converts DRM into an in-core matrix and applies
> plus()
> > method on Matrix. The result is a Matrix, not DRM.
> >
> > drm.plus(1) is EXACTLY the same as:
> >
> > Matrix m = drm.collect()
> > m.plus(1)
> >
> > The implicit def drm2InCore() syntactic sugar is probably turning out to
> be
> > dangerous in this case, in terms of hinting the wrong meaning.
> >
> > Thanks
> >
> >
> >
> >
> >
> > >  On Jul 21, 2014, at 3:23 PM, Dmitriy Lyubimov <[email protected]>
> > wrote:
> > >
> > > perhaps just compare row count with max(key)? that's exactly what lazy
> > > nrow() currently does in this case.
> > >
> > >
> > > On Mon, Jul 21, 2014 at 3:21 PM, Dmitriy Lyubimov <[email protected]>
> > > wrote:
> > >
> > > >
> > > > ok. so it should be easy to fix at least everything but elementwise
> > > scalar
> > > > i guess.
> > > >
> > > > Since the notion of "missing rows" is only defined for int-keyed
> > > datasets,
> > > > then ew scalar technically should work for non-int keyed datasets
> > > already.
> > > >
> > > > as for int-keyed datasets, i am not sure what is the best strategy.
> > > > Obviously, one can define sort of normalization/validation of
> int-keyed
> > > > dataset routine, but it would be fairly expensive to run "just
> > because".
> > > > Perhaps there's a cheap test (as cheap as row count job) to run to
> test
> > > for
> > > > int keys consistency when matrix is first created.
> > > >
> > > >
> > > >
> > > > On Mon, Jul 21, 2014 at 3:12 PM, Anand Avati <[email protected]>
> > wrote:
> > > >
> > > >>
> > > >>
> > > >>
> > > >> On Mon, Jul 21, 2014 at 3:08 PM, Dmitriy Lyubimov <
> [email protected]>
> > > >> wrote:
> > > >>
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Mon, Jul 21, 2014 at 3:06 PM, Anand Avati <[email protected]>
> > > wrote:
> > > >>>
> > > >>>> Dmitriy, comments inline -
> > > >>>>
> > > >>>> On Jul 21, 2014, at 1:12 PM, Dmitriy Lyubimov <[email protected]>
> > > >>>> wrote:
> > > >>>>
> > > >>>>> And no, i suppose it is ok to have "missing" rows even in case of
> > > >>>>> int-keyed matrices.
> > > >>>>>
> > > >>>>> there's one thing that you probably should be aware in this
> context
> > > >>>>> though: many algorithms don't survive empty (row-less)
> partitions,
> > in
> > > >>>>> whatever way they may come to be. Other than that, I don't feel
> > > every row
> > > >>>>> must be present -- even if there's implied order of the rows.
> > > >>>>>
> > > >>>>
> > > >>>> I'm not sure if that is necessarily true. There are three
> operators
> > > >>>> which break pretty badly with with missing rows.
> > > >>>>
> > > >>>> AewScalar - operation like A + 1 is just not applied on the
> missing
> > > >>>> row, so the final matrix will have 0's in place of 1s.
> > > >>>>
> > > >>>
> > > >>> Indeed. i have no recourse at this point.
> > > >>>
> > > >>>
> > > >>>>
> > > >>>> AewB, CbindAB - function after cogroup() throws exception if a row
> > was
> > > >>>> present on only one matrix. So I guess it is OK to have missing
> rows
> > > as
> > > >>>> long as both A and B have the exact same missing row set. Somewhat
> > > >>>> quirky/nuanced requirement.
> > > >>>>
> > > >>>
> > > >>> Agree. i actually was not aware that's a cogroup() semantics in
> > spark.
> > > I
> > > >>> though it would have an outer join semantics (as in Pig, i
> believe).
> > > Alas,
> > > >>> no recourse at this point either.
> > > >>>
> > > >>
> > > >> The exception is actually during reduceLeft after cogroup().
> Cogroup()
> > > >> itself is probably an outer-join.
> > > >>
> > > >>
> > > >>
> > > >
> > >
> > >
> >
>

Re: Problem of dimensions

Reply via email to