Thank you very much! It is version 0.9 I am leaning now. I will read the book as you advise.
2014-10-15 5:47 GMT+08:00 Ted Dunning <[email protected]>: > You should move forward to version 0.9. > > Take a look at more recent methods in this book: > > https://www.mapr.com/practical-machine-learning > > > > On Tue, Oct 14, 2014 at 2:43 AM, 王建国 <[email protected]> wrote: > > > Hi,Owen and all: > > I am a developer from china.I am building a recommendation sysytem > > based on mahhout in version-0.9.Since the userids and itemids are string, > > I need to map them to long.But I found that there is a Long-to-Int > mapping > > provided by the function "int TasteHadoopUtils.idToIndex(long)". > > Considering there may be millions even billions of users,I wonder if it > > possible to have many long mapped into one int? If ture,that does do much > > harm . > > This is quite confusing.What solution should I choose in this > > situation?Meanwhile,I read the answer from you as followed.Could you > please > > tell me > > which data structure indexed by long you use in Myrrix. Thanks in > advance. > > wangjiangwei > > > > Question: > > I have read some code about item-based recommendation in version-0.6, > > starting from "org.apache.mahout.cf.taste. > > hadoop.item.RecommenderJob". I found that there is a Long-to-Int mapping > > provided by the function "int TasteHadoopUtils.idToIndex(long)". > > Long-to-Int is performed both on userId and itemId. I wonder if it > possible > > to have two long mapped into one int? If it is the case, then we would > > likely to merge vectors from different itemids/uids, right? This is quite > > confusing. > > Is it better to provide a RandomAccessSparseVector implemented by > > OpenLongDoubleHashMap instead of OpenIntDoubleHashMap? Thanks in advance. > > Wei Feng > > Answer: > > That's right. It ought to be uncommon but can happen. For > recommenders, > > it > > "only" means that you start to treat two users or two items as the same > > thing. That doesn't do much harm though. Maybe one user's recs are a > little > > funny. > > I do think it would have been useful to index by long, but that would > have > > significantly increased memory requirements too. > > (In developing Myrrix I have switched to use a data structure indexed by > > long though, because it becomes more necessary to avoid the mapping.) > > Sean Owen > > >
