Thanks, this is what I wanted to know. So, now, there would be a separate example that reads-in the Netflix dataset in a distributed way, that would be utilize the RBM implementation. Would that be right?
The datastore I was referring to in the proposal was based on mahout.classifier.bayes.datastore. I understand the HBase, Cassandra and other adapters are being refactored out in a separate ticket, so I'll just stick with HDFS and S3. If there's anything else that I would need to add in the proposal, do let me know. On Sun, Apr 4, 2010 at 3:09 PM, Sean Owen <sro...@gmail.com> wrote: > Reusing code is fine, in principle. The code you mention, however, > will not help you much. It is non-distributed and has nothing to do > with Hadoop. You might reuse a bit of code to parse the input files, > that's about it. > > Which data store are you referring to... if I understand right, you > are implementing an algorithm on Hadoop. You would definitely not > implement anything to load into memory, and I think you want to work > with HDFS and Amazon S3, not Hbase. > > On Sun, Apr 4, 2010 at 9:29 AM, Sisir Koppaka <sisir.kopp...@gmail.com> > wrote: > > Firstly, I am expecting to reuse the code at > > *org.apache.mahout.cf.taste.example.netflix > > *and have mentioned so in my proposal. Please let me know if this is OK, > or > > if you foresee any problems doing this. Secondly, I will implement a > > HBase-based datastore as well as a InMemory-based one, but is the > > InMemory-based one a pre-requisite for the HBase-based one to be used? > > (Eventually everything has to go to memory, so is this being done > elsewhere > > or does the InMemory datastore do it?) > -- SK