The documents that we vectorize (now stored in a sequence file as a list of
tokens) really should be more like a lucene document with named fields
containing ordered sequences of tokens.  Plausibly we should have numerical
and numerical sequence values as well.  Once we have two cases, we should
allow many.

It also sounds a lot like hbase.

On Tue, Jan 19, 2010 at 7:19 AM, Drew Farris <drew.far...@gmail.com> wrote:

> On Tue, Jan 19, 2010 at 9:11 AM, Robin Anil <robin.a...@gmail.com> wrote:
> > I like this idea very much. More like adding metadata over sparse vectors
> >
> > To the ideo make it more verbose
> > Vectors currently have a name. Which is the id of the original
> document/data
> > point the vector points to ? It could also have fields or labels in which
> > the vector belong to.
>
> In one sense this begins to sound a great deal like Lucene's Document
> class. Does it make any sense to turn it on its head and provide a
> Writable that holds a collection of data (including Vectors) instead
> of adding variants of vector that old more fields to Vector?
>



-- 
Ted Dunning, CTO
DeepDyve

Reply via email to