How hard would it be to transparently support both?  Could we have one 
implementation for "smaller" problems and one for larger?

At any rate, +1 to making this be available for really large scale.

-Grant

On Dec 8, 2009, at 3:16 AM, Sean Owen wrote:

> I'm sure it's not hard. It makes (sparse) vectors consume that much
> more memory though.
> 
> This change would certainly help my case, but I already have a bit of
> a workaround: I hash longs into ints and store the reverse mapping.
> There is possibility of collision but the consequence is small in the
> context of collaborative filtering.
> 
> I suppose if I'm the only use case that would benefit at the moment,
> maybe not worth it, but if you can think of other reasons, let's
> change.
> 
> On Tue, Dec 8, 2009 at 5:48 AM, Jake Mannix <jake.man...@gmail.com> wrote:
>> This brings up a point about our linear primitives: are 32bit integers big
>> enough for our index range for vectors and matrices?  Especially for
>> matrices,
>> having billions of rows is completely possible, even if it is on the large
>> side.
>> 
>> If we want to be about "scalable" machine learning, we really don't want to
>> seal ourselves in to "only" 2 billion x 2 billion matrices in the long run,
>> do we?
>> 
>> How hard would it be to promote our ints to longs?
>> 
>>  -jake
>> 
>> On Sat, Dec 5, 2009 at 4:48 AM, Sean Owen <sro...@gmail.com> wrote:
>> 
>>> I'm trying to use Vectors to represent a vector of user preferences.
>>> All is well since items are numeric and can be used as indexes into a
>>> Vector -- almost. I have longs, and of course indexes are ints.
>>> 
>>> I could fold the long IDs into ints without too much worry about the
>>> effects of collision. However I still need to remember the original
>>> item IDs for each index. I could do it with labels, but I can't
>>> retrieve the label for an index (and the other mapping isn't
>>> serialized anyway?).
>>> 
>>> So I guess I must separately store this mapping? Just making sure I'm
>>> not missing something.
>>> 
>> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using 
Solr/Lucene:
http://www.lucidimagination.com/search

Reply via email to