[
https://issues.apache.org/jira/browse/BIGTOP-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
jay vyas updated BIGTOP-1414:
-----------------------------
Fix Version/s: 0.9.0
> Add Apache Spark implementation to BigPetStore
> ----------------------------------------------
>
> Key: BIGTOP-1414
> URL: https://issues.apache.org/jira/browse/BIGTOP-1414
> Project: Bigtop
> Issue Type: Improvement
> Components: blueprints
> Affects Versions: backlog
> Reporter: jay vyas
> Fix For: 0.9.0
>
>
> Currently we only process data with hadoop. Now its time to add spark to the
> bigpetstore application. This will basically demonstrate the difference
> between a mapreduce based hadoop implementation of a big data app, versus a
> Spark one.
> *We will need to*
> - update graphviz arch.dot to diagram spark as a new path.
> - Adding a spark job to the existing code, in a new package., which uses
> existing scala based generator, however, we will use it inside a spark job,
> rather than in a hadoop inputsplit.
> - The job should output to an RDD, which can then be serialized to disk, or
> else, fed into the next spark job...
> *So, the next spark job should*
> - group the data and write product summaries to a local file
> - run a product recommender against the input data set.
> We want the jobs to be runnable as modular, or as a single job, to leverage
> the RDD paradigm.
> So it will be interesting to see how the code is architected. Lets start
> the planning in this JIRA. I have some stuff ive informally hacked together,
> maybe i can attach an initial patch just to start a dialog.
--
This message was sent by Atlassian JIRA
(v6.2#6252)