Ok cool beans, I'll try and get some form of my (local or hdfs) disk-backed
vectors and matrices in a patch at some point soon, as well as a
 SequenceFile<IntWritable,DoubleWritable> vector if I can get around to it
after that.

My main aim is to remove the various memory scaling constraints (which don't
get hit until the 10's to 100's of millions of columns typically) in the
distributed SVD code, but I have a feeling some of this will be more
generally useful elsewhere in the codebase.

  -jake

On Fri, Sep 10, 2010 at 12:12 PM, Ted Dunning <[email protected]> wrote:

> On Fri, Sep 10, 2010 at 7:18 AM, Isabel Drost <[email protected]> wrote:
>
> > > I'd say add the disk backed ones and we'll worry about education
> > > separately.
> >
> > +1
> >
>
> +1
>
>
> >
> >
> > > Perhaps it's possible for the vector to keep track of
> > > thrashing and spit out warnings. Either that or you override the
> > > random accessors on the file-based ones to throw exceptions so that
> > > it fails early for users.
> >
> > The first options gives users more freedom about what to do with the
> > implementation - while the latter one probably saves us a few mails
> > about the implementation being sooooo slow... I'd be fine to go with
> > either option.
>
>
> I think fail early is the right answer.  Nobody much minds that remove
> throws exception on all of our iterators.
>

Reply via email to