[jira] [Commented] (BIGTOP-1414) Add Apache Spark implementation to BigPetStore

JIRA Sat, 23 Aug 2014 07:14:23 -0700

    [ 
https://issues.apache.org/jira/browse/BIGTOP-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14108011#comment-14108011
 ]


Jörn Franke commented on BIGTOP-1414:
-------------------------------------

I agree with you. Two questions:

"update graphviz arch.dot to diagram spark as a new path."
where is this package?

"Adding a spark job to the existing code, in a new package., which uses 
existing scala based generator, however, we will use it inside a spark job, 
rather than in a hadoop inputsplit."
where to put put the code? is it a new subproject, with its own build.gradle? 
where can I find the other jobs to have some example?

We can create some subtasks according to the tasks you mentioned in the first 
post.
 
I can contribute to this issue.

> Add Apache Spark implementation to BigPetStore
> ----------------------------------------------
>
>                 Key: BIGTOP-1414
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-1414
>             Project: Bigtop
>          Issue Type: Improvement
>          Components: blueprints
>    Affects Versions: backlog
>            Reporter: jay vyas
>             Fix For: 0.9.0
>
>
> Currently we only process data with hadoop.  Now its time to add spark to the 
> bigpetstore application.  This will basically demonstrate the difference 
> between a mapreduce based hadoop implementation of a big data app, versus a 
> Spark one.   
> *We will need to*
> - update graphviz arch.dot to diagram spark as a new path.
> - Adding a spark job to the existing code, in a new package., which uses 
> existing scala based generator, however, we will use it inside  a spark job, 
> rather than in a hadoop inputsplit.
> - The job should output to an RDD, which can then be serialized to disk, or 
> else, fed into the next spark job... 
> *So, the next spark job should*
> - group the data and write product summaries to a local file
> - run a product recommender against the input data set.
> We want the jobs to be runnable as modular, or as a single job, to leverage 
> the RDD paradigm.  
> So it will be interesting to see how the code is architected.    Lets start 
> the planning in this JIRA.  I have some stuff ive informally hacked together, 
> maybe i can attach an initial patch just to start a dialog. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (BIGTOP-1414) Add Apache Spark implementation to BigPetStore

Reply via email to