Re: H2O integration - intermediate progress update

Dmitriy Lyubimov Wed, 18 Jun 2014 17:40:27 -0700

On Wed, Jun 18, 2014 at 5:35 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:


> Also, note that the row keys in Mahout are not actually stored in the
> matrices that we manipulate.


They are. I am not sure about DistributedRowMatrix class for mapreduce, but
in sparkbindings they are. they are intimately relevant to all algebra and
especially transposition rewrites.

Even in-core matrices support column/row labels, although nobody seems to
be using it.


> If the keys can be handled separately,
> outside of the flow for the data in a drm, then you should be pretty much
> good to go.
>
>
>
>
> On Wed, Jun 18, 2014 at 5:34 PM, Ted Dunning <ted.dunn...@gmail.com>
> wrote:
>
> >
> > On Wed, Jun 18, 2014 at 12:03 PM, Dmitriy Lyubimov <dlie...@gmail.com>
> > wrote:
> >
> >> > How important are the String row keys for the algorithms itself? Would
> >> it
> >> > grossly mess up a workflow if Strings are silently discarded by the
> >> > backend?
> >> >
> >>
> >> like i said, seq2sparse produces them, and postprocessing for stuff like
> >> LSA pipelines would not work.
> >
> >
> > Something as coarse as translating to a dictionary index would probably
> > work.  Creating the dictionary in parallel while reading the data should
> be
> > quite doable.
> >
> >
>

Re: H2O integration - intermediate progress update

Reply via email to