[jira] [Commented] (BIGTOP-1089) BigPetStore: A polyglot big data processing blueprint

Sean Mackrory (JIRA) Sat, 12 Apr 2014 06:31:27 -0700

    [ 
https://issues.apache.org/jira/browse/BIGTOP-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13967502#comment-13967502
 ]


Sean Mackrory commented on BIGTOP-1089:
---------------------------------------

Thanks for posting all the notes from our review the other day. I think most of 
the are minor enough to fix in follow-up JIRAs since this is a new module and 
doesn't have to be perfect to start letting other people collaborate on it. I 
just tried your latest patch, liked the POM changes, and was able to build, run 
the pig tests, etc..

A few notes I think you should address before we +1 and commit it:
* src/integration/java/org/bigtop/bigpetstore/integration/BigPetStorePigIT.java 
still has author information in the IntelliJ header
* It looks like StringUtils.java and log4j.properties came from other projects. 
While the licenses may be identical, it is from a different project and I think 
we need a more formal declaration of where the files came from. Other files are 
still missing the license boilerplate that Cos mentioned, although I'm not sure 
that's a hard requirement as the license is distributed with the project as a 
whole as this is being added as part of the project. Either way - good to add 
the headers.
* Changing the digraph name should be really easy so it'd be nice to just get 
that done and not have anyone wandering where ethane comes from :)
Change digraph name
Regarding licensing, 

> BigPetStore: A polyglot big data processing blueprint
> -----------------------------------------------------
>
>                 Key: BIGTOP-1089
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-1089
>             Project: Bigtop
>          Issue Type: New Feature
>          Components: Blueprints
>    Affects Versions: 0.7.0
>            Reporter: jay vyas
>            Assignee: jay vyas
>             Fix For: 0.8.0
>
>         Attachments: BIGTOP-1089.patch, BIGTOP-1089.patch, 
> BIGTOP-1089.pom.patch
>
>
> The need for templates for processing big data pipelines is obvious - and 
> also - given the increasing amount of overlap across different big data and 
> nosql projects, it will provide a ground truth in the future for comparing 
> the behaviour and approach of different tools to solve a common, easily 
> comprehended problem. 
> This ticket formalizes the conversation in mailing list archives regarding 
> the BigPetStore proposal. 
> At the moment, (with the exception of word count), there are very few 
> examples of bigdata problems that have been solved by a variety of different 
> technologies.  And, even with wordcount, there arent alot of templates which 
> can be customized for applications. 
> Comparatively: Other application developer communities (i.e.the Rails folks, 
> those using maven archetypes, etc.. ) have a plethora of template 
> applications which can be used to kickstart their applications and use cases. 
>   
> This big pet store JIRA thus aims to do the following: 
> 0) Curate a single, central, standard input data set . (modified: generating 
> a large input data set on the fly).
> 1) Define a big data processing pipeline (using the pet store theme - except 
> morphing it to be analytics rather than transaction oriented), and implement 
> basic aggregations in hive, pig, etc...
> 2) Sink the results of 2 into some kind of NoSQL store or search engine.
>  
> Some implementation details -- open to change these, please comment/review -- 
> .
> - initial data source will be raw text or (better yet) some kind of 
> automatically generated data.
> - the source will initially go in bigtop/blueprints
> - the application sources can be in any modern JVM language 
> (java,scala,groovy,clojure), since bigtop supports scala, java, groovy 
> natively already and clojure is easy to support with the right jars.  
> - each "job" will be named according to the corresponding DAG of the big data 
> pipeline . 
> - all jobs should (not sure if requirement?) be controlled by a global 
> program (maybe oozie?) which runs the tasks in order, and can easily be 
> customized to use different tools at different stages. 
> - for now, all outputs will be to files: so that users don't require servers 
> to run the app. 
> - final data sinks will be into a highly available transaction oriented store 
> (solr/hbase/...)
> This ticket will be completed once a first iteration of BigPetStore is 
> complete using 3 ecosystem components, along with a depiction of the pipeline 
> which can be used for development.
> I've assigned this to myself :) I hope thats okay? Seems like at the moment 
> im the only one working on it. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (BIGTOP-1089) BigPetStore: A polyglot big data processing blueprint

Reply via email to