[jira] [Commented] (BIGTOP-1272) BigPetStore: Productionize the Mahout recommender

bhashit parikh (JIRA) Mon, 14 Jul 2014 23:16:23 -0700

    [ 
https://issues.apache.org/jira/browse/BIGTOP-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061742#comment-14061742
 ]


bhashit parikh commented on BIGTOP-1272:
----------------------------------------

# I'll take care of the trailing whitespaces. Forgot to take care of those.
# In our code, the {{Mahout RecommenderJob}} is run automatically as a part of 
the {{ItemRecommender.scala}} code. However, the recommendations are executed 
internally in two difference phases. That's what I thought I was communicating 
through {{arch.dot}}. Should I just keep a single step for it?
# The integration tests are excluded out of the test task by default since they 
are all in the "src/integrationTest" directory. The test task is only executing 
the unit tasks from the "src/test" dir. So, we don't need to add Mahout 
integration test to the pattern. Also, out of all the tests named in the 
"exclude" pattern, only the last one exists currently.
# I'll remove the TODO in {{DataForger.scala}}. I think it'll be taken care of 
when we work on integrating even more useful patterns in the data-generation as 
a part of the BIGTOP-1366.
# Yes, the scala-library will need to be present on the classpath for the scala 
code to be executed. We'll need to use {{scala-library.jar}} with version 2.11. 
I don't know if we'd need to copy the jar on all nodes of the cluster, I 
haven't run a Hadoop cluster before. I think the {{scala-library}} jar will be 
needed everywhere the {{pig-withouthadoop}} and other jars are needed. You 
mentioned that using the {{-libjars}} would avoid having to copy all the jars 
on all the nodes.

> BigPetStore: Productionize the Mahout recommender
> -------------------------------------------------
>
>                 Key: BIGTOP-1272
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-1272
>             Project: Bigtop
>          Issue Type: New Feature
>          Components: Blueprints
>    Affects Versions: backlog
>            Reporter: jay vyas
>         Attachments: BIGTOP-1272.patch, BIGTOP-1272.patch, BIGTOP-1272.patch, 
> arch.jpeg
>
>
> BIGTOP-1271 adds patterns into the data that gaurantee that a meaningfull 
> type of product recommendation can be given for at least *some* customers, 
> since we know that there are going to be many customers who only bought 1 
> product, and also customers that bought 2 or more products -- even in a 
> dataset size of 10. due to the gaussian distribution of purchases that is 
> also in the dataset generator. 
> The current mahout recommender code is statically valid: It runs to 
> completion in local unit tests if a hadoop 1x tarball is present but 
> otherwise it hasn't been tested at scale.  So, lets get it working.  this 
> JIRA also will comprise:
> - deciding wether to use mahout 2x for unit tests (default on mahout maven 
> repo is the 1x impl) and wether or not bigtop should host a mahout 2x jar?  
> After all, bigtop builds a mahout 2x jar as part of its packaging process, 
> and BigPetStore might thus need a mahout 2x jar in order to test against the 
> right same of bigtop releases.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (BIGTOP-1272) BigPetStore: Productionize the Mahout recommender

Reply via email to