On Jan 10, 2010, at 9:43 AM, Robin Anil wrote: > Lot of zeros being printed in the Json string. Is that normal for an > infinite cardinality vector?
It shouldn't print them if you are using a SparseVector, but my guess is there is something odd going on here when writing it out such that it is writing all the zeros. Also, is it writing JSON to the SeqFile or is that just the result of the dumper? Sounds like you need to hook up a debugger. > http://pastebin.com/m6ff5f0ef > Same is true if I type cast to a Vector Sure, it's still a SparseVector. My comment about using Vector was just for the API level, not the actual implementation > > > On Sun, Jan 10, 2010 at 8:08 PM, Grant Ingersoll <[email protected]>wrote: > >> Have you dumped out the file? What's in it? >> >> Also, if you can use Vector instead of SparseVector in the API (it's fine >> to bind to SparseVector in the implementation) I think that would be better. >> >> On Jan 10, 2010, at 7:00 AM, Robin Anil wrote: >> >>> >> https://issues.apache.org/jira/secure/attachment/12429846/DictionaryVectorizer.patch >>> >>> Reduce => PartialVectorGenerator Class >> >> >>
