You should move forward to version 0.9.

Take a look at more recent methods in this book:

https://www.mapr.com/practical-machine-learning



On Tue, Oct 14, 2014 at 2:43 AM, 王建国 <[email protected]> wrote:

> Hi,Owen and all:
>     I am a developer from china.I am building a recommendation sysytem
> based on mahhout in version-0.9.Since the userids and itemids are string,
> I need to map them to long.But I found that  there is a Long-to-Int mapping
> provided by the function "int TasteHadoopUtils.idToIndex(long)".
> Considering there may be millions  even billions of users,I wonder if  it
> possible to have many long mapped into one int? If ture,that does do much
> harm .
> This is quite confusing.What solution should I choose in this
> situation?Meanwhile,I read the answer from you as followed.Could you please
> tell me
> which data structure indexed by long you use in Myrrix. Thanks in advance.
> wangjiangwei
>
> Question:
> I have read some code about item-based recommendation in version-0.6,
> starting from "org.apache.mahout.cf.taste.
> hadoop.item.RecommenderJob". I found that there is a Long-to-Int mapping
> provided by the function "int TasteHadoopUtils.idToIndex(long)".
> Long-to-Int is performed both on userId and itemId. I wonder if it possible
> to have two long mapped into one int? If it is the case, then we would
> likely to merge vectors from different itemids/uids, right? This is quite
> confusing.
> Is it better to provide a RandomAccessSparseVector implemented by
> OpenLongDoubleHashMap instead of OpenIntDoubleHashMap? Thanks in advance.
> Wei Feng
> Answer:
>     That's right. It ought to be uncommon but can happen. For recommenders,
> it
> "only" means that you start to treat two users or two items as the same
> thing. That doesn't do much harm though. Maybe one user's recs are a little
> funny.
> I do think it would have been useful to index by long, but that would have
> significantly increased memory requirements too.
> (In developing Myrrix I have switched to use a data structure indexed by
> long though, because it becomes more necessary to avoid the mapping.)
> Sean Owen
>

Reply via email to