jay vyas created BIGTOP-1414:
--------------------------------

             Summary: Add Apache implementation of BigPetStore
                 Key: BIGTOP-1414
                 URL: https://issues.apache.org/jira/browse/BIGTOP-1414
             Project: Bigtop
          Issue Type: Improvement
            Reporter: jay vyas


Currently we only process data with hadoop.  Now its time to add spark to the 
bigpetstore application.  This will basically demonstrate the difference 
between a mapreduce based hadoop implementation of a big data app, versus a 
Spark one.   

*We will need to*

- update graphviz arch.dot to diagram spark as a new path.
- Adding a spark job to the existing code, in a new package., which uses 
existing scala based generator, however, we will use it inside  a spark job, 
rather than in a hadoop inputsplit.
- The job should output to an RDD, which can then be serialized to disk, or 
else, fed into the next spark job... 

*So, the next spark job should*

- group the data and write product summaries to a local file
- run a product recommender against the input data set.

We want the jobs to be runnable as modular, or as a single job, to leverage the 
RDD paradigm.  

So it will be interesting to see how the code is architected.    Lets start the 
planning in this JIRA.  I have some stuff ive informally hacked together, maybe 
i can attach an initial patch just to start a dialog. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to