On Thu, Feb 4, 2010 at 1:45 PM, Robin Anil <robin.a...@gmail.com> wrote: > > if you have a clear plan lets do it or lets do the first version with just > > document -> analyzer -> token array -> vector > |-> ngram -> vector >
Ted summed it up perfectly. I think this is great until we get further along with the document work. > > Lets not have overlapping ids otherwise it becomes a pain to merge. have > unique ids in sequence file, and a file with last id used ? > Ok, I will read the partial vector/dictionary code to get my head around this.