You'd need a copy of the class with longs for ints. Seems a little
yucky. I suppose I'm waiting to see if there's any other demand for
it, then proceed if so.

On Tue, Dec 8, 2009 at 11:08 AM, Grant Ingersoll <gsing...@apache.org> wrote:
> How hard would it be to transparently support both?  Could we have one 
> implementation for "smaller" problems and one for larger?
>
> At any rate, +1 to making this be available for really large scale.
>
> -Grant
>
> On Dec 8, 2009, at 3:16 AM, Sean Owen wrote:
>
>> I'm sure it's not hard. It makes (sparse) vectors consume that much
>> more memory though.
>>
>> This change would certainly help my case, but I already have a bit of
>> a workaround: I hash longs into ints and store the reverse mapping.
>> There is possibility of collision but the consequence is small in the
>> context of collaborative filtering.
>>
>> I suppose if I'm the only use case that would benefit at the moment,
>> maybe not worth it, but if you can think of other reasons, let's
>> change.
>>
>> On Tue, Dec 8, 2009 at 5:48 AM, Jake Mannix <jake.man...@gmail.com> wrote:
>>> This brings up a point about our linear primitives: are 32bit integers big
>>> enough for our index range for vectors and matrices?  Especially for
>>> matrices,
>>> having billions of rows is completely possible, even if it is on the large
>>> side.
>>>
>>> If we want to be about "scalable" machine learning, we really don't want to
>>> seal ourselves in to "only" 2 billion x 2 billion matrices in the long run,
>>> do we?
>>>
>>> How hard would it be to promote our ints to longs?
>>>
>>>  -jake
>>>
>>> On Sat, Dec 5, 2009 at 4:48 AM, Sean Owen <sro...@gmail.com> wrote:
>>>
>>>> I'm trying to use Vectors to represent a vector of user preferences.
>>>> All is well since items are numeric and can be used as indexes into a
>>>> Vector -- almost. I have longs, and of course indexes are ints.
>>>>
>>>> I could fold the long IDs into ints without too much worry about the
>>>> effects of collision. However I still need to remember the original
>>>> item IDs for each index. I could do it with labels, but I can't
>>>> retrieve the label for an index (and the other mapping isn't
>>>> serialized anyway?).
>>>>
>>>> So I guess I must separately store this mapping? Just making sure I'm
>>>> not missing something.
>>>>
>>>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using 
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>

Reply via email to