You should move forward to version 0.9. Take a look at more recent methods in this book:
https://www.mapr.com/practical-machine-learning On Tue, Oct 14, 2014 at 2:43 AM, 王建国 <[email protected]> wrote: > Hi,Owen and all: > I am a developer from china.I am building a recommendation sysytem > based on mahhout in version-0.9.Since the userids and itemids are string, > I need to map them to long.But I found that there is a Long-to-Int mapping > provided by the function "int TasteHadoopUtils.idToIndex(long)". > Considering there may be millions even billions of users,I wonder if it > possible to have many long mapped into one int? If ture,that does do much > harm . > This is quite confusing.What solution should I choose in this > situation?Meanwhile,I read the answer from you as followed.Could you please > tell me > which data structure indexed by long you use in Myrrix. Thanks in advance. > wangjiangwei > > Question: > I have read some code about item-based recommendation in version-0.6, > starting from "org.apache.mahout.cf.taste. > hadoop.item.RecommenderJob". I found that there is a Long-to-Int mapping > provided by the function "int TasteHadoopUtils.idToIndex(long)". > Long-to-Int is performed both on userId and itemId. I wonder if it possible > to have two long mapped into one int? If it is the case, then we would > likely to merge vectors from different itemids/uids, right? This is quite > confusing. > Is it better to provide a RandomAccessSparseVector implemented by > OpenLongDoubleHashMap instead of OpenIntDoubleHashMap? Thanks in advance. > Wei Feng > Answer: > That's right. It ought to be uncommon but can happen. For recommenders, > it > "only" means that you start to treat two users or two items as the same > thing. That doesn't do much harm though. Maybe one user's recs are a little > funny. > I do think it would have been useful to index by long, but that would have > significantly increased memory requirements too. > (In developing Myrrix I have switched to use a data structure indexed by > long though, because it becomes more necessary to avoid the mapping.) > Sean Owen >
