[ 
https://issues.apache.org/jira/browse/BIGTOP-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054497#comment-14054497
 ] 

bhashit parikh edited comment on BIGTOP-1272 at 7/8/14 5:54 AM:
----------------------------------------------------------------

The jars used by gradle are built without any dependencies. Since, in the 
gradle environment, all the dependencies are available in the build 
environment. While running with {{hadoop jar}} command, we'd need all the 
dependencies used by mahout itself as well, while excluding the hadoop 
dependency. I found out after going through a mahout book and some 
documentation that this was the standard way for using mahout from the command 
line. They knew that mahout is frequently used as a hadoop map-reduce job ((is 
this the right sentence structure?)), so, by default, they are providing a jar 
as a aprt of their {{mvn package}} process that we can use with the {{hadoop 
jar}} command. Even with pig, I used the pig-withouthadoop.jar from the 
standard pig distribution.


was (Author: bhashit):
The jars used by gradle are built without any dependencies. Since, in the 
gradle environment, all the dependencies are available in the build 
environment. While running with {{hadoop jar}} command, we'd need all the 
dependencies used by mahout itself as well, while excluding the hadoop 
dependency. I found out after going through a mahout book and some 
documentation that this was the standard way for using mahout from the command 
line. They knew that mahout is frequently used as a hadoop map-reduce job ((is 
this the right sentence structure?)), so, by default, they are providing a jar 
that we can use with the {{hadoop jar}} command. Even with pig, I used the 
pig-withouthadoop.jar from the standard pig distribution.

> BigPetStore: Productionize the Mahout recommender
> -------------------------------------------------
>
>                 Key: BIGTOP-1272
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-1272
>             Project: Bigtop
>          Issue Type: New Feature
>          Components: Blueprints
>    Affects Versions: backlog
>            Reporter: jay vyas
>         Attachments: BIGTOP-1272.patch, BIGTOP-1272.patch, arch.jpeg
>
>
> BIGTOP-1271 adds patterns into the data that gaurantee that a meaningfull 
> type of product recommendation can be given for at least *some* customers, 
> since we know that there are going to be many customers who only bought 1 
> product, and also customers that bought 2 or more products -- even in a 
> dataset size of 10. due to the gaussian distribution of purchases that is 
> also in the dataset generator. 
> The current mahout recommender code is statically valid: It runs to 
> completion in local unit tests if a hadoop 1x tarball is present but 
> otherwise it hasn't been tested at scale.  So, lets get it working.  this 
> JIRA also will comprise:
> - deciding wether to use mahout 2x for unit tests (default on mahout maven 
> repo is the 1x impl) and wether or not bigtop should host a mahout 2x jar?  
> After all, bigtop builds a mahout 2x jar as part of its packaging process, 
> and BigPetStore might thus need a mahout 2x jar in order to test against the 
> right same of bigtop releases.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to