Thanks, this is what I wanted to know. So, now, there would be a separate
example that reads-in the Netflix dataset in a distributed way, that would
be utilize the RBM implementation. Would that be right?

The datastore I was referring to in the proposal was based on
mahout.classifier.bayes.datastore. I understand the HBase, Cassandra and
other adapters are being refactored out in a separate ticket, so I'll just
stick with HDFS and S3.

If there's anything else that I would need to add in the proposal, do let me
know.

On Sun, Apr 4, 2010 at 3:09 PM, Sean Owen <sro...@gmail.com> wrote:

> Reusing code is fine, in principle. The code you mention, however,
> will not help you much. It is non-distributed and has nothing to do
> with Hadoop. You might reuse a bit of code to parse the input files,
> that's about it.
>
> Which data store are you referring to... if I understand right, you
> are implementing an algorithm on Hadoop. You would definitely not
> implement anything to load into memory, and I think you want to work
> with HDFS and Amazon S3, not Hbase.
>
> On Sun, Apr 4, 2010 at 9:29 AM, Sisir Koppaka <sisir.kopp...@gmail.com>
> wrote:
> > Firstly, I am expecting to reuse the code at
> > *org.apache.mahout.cf.taste.example.netflix
> > *and have mentioned so in my proposal. Please let me know if this is OK,
> or
> > if you foresee any problems doing this. Secondly, I will implement a
> > HBase-based datastore as well as a InMemory-based one, but is the
> > InMemory-based one a pre-requisite for the HBase-based one to be used?
> > (Eventually everything has to go to memory, so is this being done
> elsewhere
> > or does the InMemory datastore do it?)
>



-- 
SK

Reply via email to