I'm not sure that Dmitriy's use-case has an easy solution. As you say, Writable loads into memory the whole thing, independently of whether you try / not try to do buffering on iteration.
My situation (monstrous vectors) is easier, in some respects: if the matrices are essentially SequenceFile<IntWritable,Pair<IntWritable,DoubleWritable>>, then there are a lot bigger vectors which can be handled in MR jobs, but they no longer really look like "vectors" in the interface sense. -jake On Mon, Dec 13, 2010 at 12:52 PM, Ted Dunning <[email protected]> wrote: > OK. > > Let's assume that this is needed. > > I think that an iterable interface on VectorWritable that throws > UnsupportedOperationException or similar if > you try to get the iterator twice is much more transparent than a watcher > structure and much easier for a user > to discover/re-invent. > > Another (evil) thought is a parallel class to VectorWritable which is > essentially SequentialAccessVectorWritable that supports reading and > writing. It seems to me that the Writable isn't real compatible with this > interface in any case. How will that be resolved? > > > On Mon, Dec 13, 2010 at 11:36 AM, Dmitriy Lyubimov <[email protected] > >wrote: > > > Absent of this solution, i realistically don't see how i can go without a > > push technique in accessing the vectors. > > >
