How hard would it be to transparently support both? Could we have one implementation for "smaller" problems and one for larger?
At any rate, +1 to making this be available for really large scale. -Grant On Dec 8, 2009, at 3:16 AM, Sean Owen wrote: > I'm sure it's not hard. It makes (sparse) vectors consume that much > more memory though. > > This change would certainly help my case, but I already have a bit of > a workaround: I hash longs into ints and store the reverse mapping. > There is possibility of collision but the consequence is small in the > context of collaborative filtering. > > I suppose if I'm the only use case that would benefit at the moment, > maybe not worth it, but if you can think of other reasons, let's > change. > > On Tue, Dec 8, 2009 at 5:48 AM, Jake Mannix <jake.man...@gmail.com> wrote: >> This brings up a point about our linear primitives: are 32bit integers big >> enough for our index range for vectors and matrices? Especially for >> matrices, >> having billions of rows is completely possible, even if it is on the large >> side. >> >> If we want to be about "scalable" machine learning, we really don't want to >> seal ourselves in to "only" 2 billion x 2 billion matrices in the long run, >> do we? >> >> How hard would it be to promote our ints to longs? >> >> -jake >> >> On Sat, Dec 5, 2009 at 4:48 AM, Sean Owen <sro...@gmail.com> wrote: >> >>> I'm trying to use Vectors to represent a vector of user preferences. >>> All is well since items are numeric and can be used as indexes into a >>> Vector -- almost. I have longs, and of course indexes are ints. >>> >>> I could fold the long IDs into ints without too much worry about the >>> effects of collision. However I still need to remember the original >>> item IDs for each index. I could do it with labels, but I can't >>> retrieve the label for an index (and the other mapping isn't >>> serialized anyway?). >>> >>> So I guess I must separately store this mapping? Just making sure I'm >>> not missing something. >>> >> -------------------------- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search