[
https://issues.apache.org/jira/browse/BIGTOP-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040749#comment-14040749
]
jay vyas commented on BIGTOP-1272:
----------------------------------
I'd like to keep the architecture in place in terms of total overall steps :
* if we are generating a petabyte of data - it could take hours to run a single
job . Adding more jobs --> more time, more manual steps by people running the
app, and more integration tests to write for us.
* we dont want to clutter the pipeline with extra steps that don't add more
breadth to the amount of ecosystem we cover.
Do you agree - in that sense - that adding more {{pig scripts}} is going to
make things harder to maintain? If so lets do {{MultipleOutputs}}. *But
please do feel free to debate the point further - if I'm missing something, and
there is some extra value to the additional step ?*. I think
{{MultipleOutputs}} will be an easy 4 or 5 lines of extension to the existing
{{o.a.b.bps.generator.MyMapper}} class , and just another couple of lines to
extend the {{arch.dot}} and the {{TestPetStoreTransactionGeneratorJob}}.
> BigPetStore: Productionize the Mahout recommender
> -------------------------------------------------
>
> Key: BIGTOP-1272
> URL: https://issues.apache.org/jira/browse/BIGTOP-1272
> Project: Bigtop
> Issue Type: New Feature
> Components: Blueprints
> Affects Versions: backlog
> Reporter: jay vyas
> Attachments: arch.jpeg
>
>
> BIGTOP-1271 adds patterns into the data that gaurantee that a meaningfull
> type of product recommendation can be given for at least *some* customers,
> since we know that there are going to be many customers who only bought 1
> product, and also customers that bought 2 or more products -- even in a
> dataset size of 10. due to the gaussian distribution of purchases that is
> also in the dataset generator.
> The current mahout recommender code is statically valid: It runs to
> completion in local unit tests if a hadoop 1x tarball is present but
> otherwise it hasn't been tested at scale. So, lets get it working. this
> JIRA also will comprise:
> - deciding wether to use mahout 2x for unit tests (default on mahout maven
> repo is the 1x impl) and wether or not bigtop should host a mahout 2x jar?
> After all, bigtop builds a mahout 2x jar as part of its packaging process,
> and BigPetStore might thus need a mahout 2x jar in order to test against the
> right same of bigtop releases.
--
This message was sent by Atlassian JIRA
(v6.2#6252)