Supporting Int and Long keys are easy, both should be working shortly.
String is tricky, as H2O stores only numbers. One suggestion has been to
break up the string into bytes and store them as separate columns (and
re-assemble them on demand). I'll look into String support after finishing
the operators.

How important are the String row keys for the algorithms itself? Would it
grossly mess up a workflow if Strings are silently discarded by the backend?



On Wed, Jun 18, 2014 at 10:58 AM, Dmitriy Lyubimov <dlie...@gmail.com>
wrote:

> Supporting Int and String keys are perhaps minimum set (Long is welcome,
> but a second-class citizen)
>
> supporting of DrmLike[Int] is required for a lot of things (e.g.
> Transpose). DrmLike[String] is used in outputs of popular vectorizations in
> Mahout such as seq2sparse.
>
>
> On Tue, Jun 17, 2014 at 5:22 PM, Anand Avati <av...@gluster.org> wrote:
>
> > Still incomplete, everything does NOT work. But lots of progress and end
> is
> > in sight.
> >
> > - Development happening at
> > https://github.com/avati/mahout/commits/MAHOUT-1500. Note that I'm still
> > doing lots of commit --amend and git push --force as this is my private
> > tree.
> >
> > - Ground level build issues and classloader incompatibilities fixed.
> >
> > - Can load a matrix into H2O either from in core (through
> drmParallelize())
> > or HDFS (parser does not support seqfile yet)
> >
> > - Only Long type support for Row Keys so far.
> >
> > - mapBlock() works. This was the trickiest, other ops seem trivial in
> > comparison.
> >
> > Everything else yet to be done. However I will be putting in more time
> into
> > this over the coming days (was working less than part time on this so
> far.)
> >
> > Questions/comments welcome.
> >
>

Reply via email to