[jira] [Updated] (BIGTOP-1272) BigPetStore: Productionize the Mahout recommender

bhashit parikh (JIRA) Mon, 28 Jul 2014 01:35:28 -0700

     [ 
https://issues.apache.org/jira/browse/BIGTOP-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


bhashit parikh updated BIGTOP-1272:
-----------------------------------

    Attachment: BIGTOP-1272.patch

I have added code the creating a fatjar, and ran it successfully on a single 
node cluster using the generated fatjar. I have also modified a bit of a code 
that was causing the empty (numbers-only) records to be generated by pig. Turns 
out that since the the mahout-input was being stored in the same dir as the 
cleaned output (TSV file), and the pig script was using the entire cleaned 
directory as input, the mahout-input records were being used by the ad-hoc 
script as well. Well, that's taken care of now.

The instructions for running are:

# Use {{gradle shadowJar -Pfor-cluster}} to generate a fatjar that excludes the 
pig, hadoop, and mahout dependencies (including the transitive ones). The name 
of the generated file will be {{BigPetStore-0.8.0-SNAPSHOT-all.jar}}, inside 
the *build/lib* dir. I'll refer to it as {{bps.jar}} for convenience.
# Find or generate the {{pig-withouthadoop.jar}} from the pig distribution. To 
build the correct jar, you can use the command {{ant mvn-jar}} from inside your 
pig distribution/checkout. After running this command, you can find 
{{pig-0.12.1-SNAPSHOT-withouthadoop-h2.jar}} inside the {{build}} dir. This is 
the exact jar that is used by our gradle build.
# 

> BigPetStore: Productionize the Mahout recommender
> -------------------------------------------------
>
>                 Key: BIGTOP-1272
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-1272
>             Project: Bigtop
>          Issue Type: New Feature
>          Components: Blueprints
>    Affects Versions: backlog
>            Reporter: jay vyas
>         Attachments: BIGTOP-1272.patch, BIGTOP-1272.patch, BIGTOP-1272.patch, 
> BIGTOP-1272.patch, arch.jpeg, build.gradle
>
>
> BIGTOP-1271 adds patterns into the data that gaurantee that a meaningfull 
> type of product recommendation can be given for at least *some* customers, 
> since we know that there are going to be many customers who only bought 1 
> product, and also customers that bought 2 or more products -- even in a 
> dataset size of 10. due to the gaussian distribution of purchases that is 
> also in the dataset generator. 
> The current mahout recommender code is statically valid: It runs to 
> completion in local unit tests if a hadoop 1x tarball is present but 
> otherwise it hasn't been tested at scale.  So, lets get it working.  this 
> JIRA also will comprise:
> - deciding wether to use mahout 2x for unit tests (default on mahout maven 
> repo is the 1x impl) and wether or not bigtop should host a mahout 2x jar?  
> After all, bigtop builds a mahout 2x jar as part of its packaging process, 
> and BigPetStore might thus need a mahout 2x jar in order to test against the 
> right same of bigtop releases.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (BIGTOP-1272) BigPetStore: Productionize the Mahout recommender

Reply via email to