How common is it that a row won't fit in memory?  My experience is that
essentially all rows that
I am interested will fit in very modest amounts of memory, but that row by
row handling is imperative.

Is this just gilding the lily?

On Mon, Dec 13, 2010 at 10:24 AM, Jake Mannix <[email protected]> wrote:

> Hey Dmitriy,
>
>  I've also been playing around with a VectorWritable format which is backed
> by a
> SequenceFile, but I've been focussed on the case where it's essentially the
> entire
> matrix, and the rows don't fit into memory.  This seems different than your
> current
> use case, however - you just want (relatively) small vectors to load
> faster,
> right?
>
>  -jake
>
> On Mon, Dec 13, 2010 at 10:18 AM, Ted Dunning <[email protected]>
> wrote:
>
> > Interesting idea.
> >
> > Would this introduce a new vector type that only allows iterating through
> > the elements once?
> >
> > On Mon, Dec 13, 2010 at 9:49 AM, Dmitriy Lyubimov <[email protected]>
> > wrote:
> >
> > > Hi all,
> > >
> > > I would like to submit a patch to VectorWritable that allows for
> > streaming
> > > access to vector elements without having to prebuffer all of them
> first.
> > > (current code allows for the latter only).
> > >
> > > That patch would allow to strike down one of the memory usage issues in
> > > current Stochastic SVD implementation and effectively open memory bound
> > for
> > > n of the SVD work. (The value i see is not to open up the the bound
> > though
> > > but just be more efficient in memory use, thus essentially speeding u p
> > the
> > > computation. )
> > >
> > > If it's ok, i would like to create a JIRA issue and provide a patch for
> > it.
> > >
> > > Another issue is to provide an SSVD patch that depends on that patch
> for
> > > VectorWritable.
> > >
> > > Thank you.
> > > -Dmitriy
> > >
> >
>

Reply via email to