You'd need a copy of the class with longs for ints. Seems a little yucky. I suppose I'm waiting to see if there's any other demand for it, then proceed if so.
On Tue, Dec 8, 2009 at 11:08 AM, Grant Ingersoll <gsing...@apache.org> wrote: > How hard would it be to transparently support both? Could we have one > implementation for "smaller" problems and one for larger? > > At any rate, +1 to making this be available for really large scale. > > -Grant > > On Dec 8, 2009, at 3:16 AM, Sean Owen wrote: > >> I'm sure it's not hard. It makes (sparse) vectors consume that much >> more memory though. >> >> This change would certainly help my case, but I already have a bit of >> a workaround: I hash longs into ints and store the reverse mapping. >> There is possibility of collision but the consequence is small in the >> context of collaborative filtering. >> >> I suppose if I'm the only use case that would benefit at the moment, >> maybe not worth it, but if you can think of other reasons, let's >> change. >> >> On Tue, Dec 8, 2009 at 5:48 AM, Jake Mannix <jake.man...@gmail.com> wrote: >>> This brings up a point about our linear primitives: are 32bit integers big >>> enough for our index range for vectors and matrices? Especially for >>> matrices, >>> having billions of rows is completely possible, even if it is on the large >>> side. >>> >>> If we want to be about "scalable" machine learning, we really don't want to >>> seal ourselves in to "only" 2 billion x 2 billion matrices in the long run, >>> do we? >>> >>> How hard would it be to promote our ints to longs? >>> >>> -jake >>> >>> On Sat, Dec 5, 2009 at 4:48 AM, Sean Owen <sro...@gmail.com> wrote: >>> >>>> I'm trying to use Vectors to represent a vector of user preferences. >>>> All is well since items are numeric and can be used as indexes into a >>>> Vector -- almost. I have longs, and of course indexes are ints. >>>> >>>> I could fold the long IDs into ints without too much worry about the >>>> effects of collision. However I still need to remember the original >>>> item IDs for each index. I could do it with labels, but I can't >>>> retrieve the label for an index (and the other mapping isn't >>>> serialized anyway?). >>>> >>>> So I guess I must separately store this mapping? Just making sure I'm >>>> not missing something. >>>> >>> > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > Solr/Lucene: > http://www.lucidimagination.com/search > >