[jira] [Commented] (BIGTOP-1272) BigPetStore: Productionize the Mahout recommender

bhashit parikh (JIRA) Sat, 19 Jul 2014 08:57:06 -0700

    [ 
https://issues.apache.org/jira/browse/BIGTOP-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14067562#comment-14067562
 ]


bhashit parikh commented on BIGTOP-1272:
----------------------------------------

I am looking into the options that we can have. Unlike maven, gradle doesn't 
have first-class support for building with dependencies. But unlike maven, we 
do have the full power of Groovy. 

I have been looking into the whole thing today. There are a few approaches that 
we can use; all of which are described 
[here|http://www.datasalt.com/2011/05/handling-dependencies-and-configuration-in-java-hadoop-projects-efficiently/].
 [~jayunit100] let me know if I have missed a candidate. 

There is one problem that I think can be a bit problematic. The hadoop 
distribution itself provides all the {{hadoop-core}} and other transitive 
dependencies while running with {{hadoop jar}} command. So, we'll need to 
exclude the hadoop dependencies when we build the jar. Something along the 
lines of the maven {{provided}} scope; but it should be more versatile than 
that.

I'll include the gradle wrapper with the new patch as well. As the eclipse 
issue is not the only thing that could be problematic; the gradle folks seem to 
have made some backward-incompatible changes in the syntax. The wrapper will 
also help with the CI in BIGTOP-1379.

> BigPetStore: Productionize the Mahout recommender
> -------------------------------------------------
>
>                 Key: BIGTOP-1272
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-1272
>             Project: Bigtop
>          Issue Type: New Feature
>          Components: Blueprints
>    Affects Versions: backlog
>            Reporter: jay vyas
>         Attachments: BIGTOP-1272.patch, BIGTOP-1272.patch, BIGTOP-1272.patch, 
> arch.jpeg, build.gradle
>
>
> BIGTOP-1271 adds patterns into the data that gaurantee that a meaningfull 
> type of product recommendation can be given for at least *some* customers, 
> since we know that there are going to be many customers who only bought 1 
> product, and also customers that bought 2 or more products -- even in a 
> dataset size of 10. due to the gaussian distribution of purchases that is 
> also in the dataset generator. 
> The current mahout recommender code is statically valid: It runs to 
> completion in local unit tests if a hadoop 1x tarball is present but 
> otherwise it hasn't been tested at scale.  So, lets get it working.  this 
> JIRA also will comprise:
> - deciding wether to use mahout 2x for unit tests (default on mahout maven 
> repo is the 1x impl) and wether or not bigtop should host a mahout 2x jar?  
> After all, bigtop builds a mahout 2x jar as part of its packaging process, 
> and BigPetStore might thus need a mahout 2x jar in order to test against the 
> right same of bigtop releases.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (BIGTOP-1272) BigPetStore: Productionize the Mahout recommender

Reply via email to