[
https://issues.apache.org/jira/browse/BIGTOP-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061589#comment-14061589
]
jay vyas commented on BIGTOP-1272:
----------------------------------
minor comments:
- Regarding patch format, there are some trailing whitespaces . Its a minor
issue but we like to remove them if you can do so in your IDE. I have a recipe
for this with intellij (see the comments in BIGTOP-1240).
In arch.dot :
- how does {{MahoutRecommenderJob}} get launched ? Isnt it done all in one ?
If so it should be expressed in the second arrow (the same way you do above, in
the pig part).
In build.gradle :
should "test ..." be excluding your Mahout test as well as "TestPig" "TestHive"
"TestCrunch", etc.? I remember that we exclude those tests for a reason,
probably because they are integration tests. And since Mahout test is
integration test, shouldnt we exclude that as well ?
In the scala source:
- There is a "TODO Jay / Bhashit ..." line in the DataForger scala class. Is
that still relevant?
- Just curious : does this run on a hadoop cluster. I have not tested yet,
just wondering if anything special needs to be done for the scala genrated
libraries (i.e. do we have to add scala to the classpath on each node)? I can
work on that you arent sure how it should be done.
This is a huge patch ! So I have to keep reviewing it tomorrow. So far it
looks like a big effort and looks like it should all work.
> BigPetStore: Productionize the Mahout recommender
> -------------------------------------------------
>
> Key: BIGTOP-1272
> URL: https://issues.apache.org/jira/browse/BIGTOP-1272
> Project: Bigtop
> Issue Type: New Feature
> Components: Blueprints
> Affects Versions: backlog
> Reporter: jay vyas
> Attachments: BIGTOP-1272.patch, BIGTOP-1272.patch, BIGTOP-1272.patch,
> arch.jpeg
>
>
> BIGTOP-1271 adds patterns into the data that gaurantee that a meaningfull
> type of product recommendation can be given for at least *some* customers,
> since we know that there are going to be many customers who only bought 1
> product, and also customers that bought 2 or more products -- even in a
> dataset size of 10. due to the gaussian distribution of purchases that is
> also in the dataset generator.
> The current mahout recommender code is statically valid: It runs to
> completion in local unit tests if a hadoop 1x tarball is present but
> otherwise it hasn't been tested at scale. So, lets get it working. this
> JIRA also will comprise:
> - deciding wether to use mahout 2x for unit tests (default on mahout maven
> repo is the 1x impl) and wether or not bigtop should host a mahout 2x jar?
> After all, bigtop builds a mahout 2x jar as part of its packaging process,
> and BigPetStore might thus need a mahout 2x jar in order to test against the
> right same of bigtop releases.
--
This message was sent by Atlassian JIRA
(v6.2#6252)