[ https://issues.apache.org/jira/browse/BIGTOP-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13967502#comment-13967502 ]
Sean Mackrory commented on BIGTOP-1089: --------------------------------------- Thanks for posting all the notes from our review the other day. I think most of the are minor enough to fix in follow-up JIRAs since this is a new module and doesn't have to be perfect to start letting other people collaborate on it. I just tried your latest patch, liked the POM changes, and was able to build, run the pig tests, etc.. A few notes I think you should address before we +1 and commit it: * src/integration/java/org/bigtop/bigpetstore/integration/BigPetStorePigIT.java still has author information in the IntelliJ header * It looks like StringUtils.java and log4j.properties came from other projects. While the licenses may be identical, it is from a different project and I think we need a more formal declaration of where the files came from. Other files are still missing the license boilerplate that Cos mentioned, although I'm not sure that's a hard requirement as the license is distributed with the project as a whole as this is being added as part of the project. Either way - good to add the headers. * Changing the digraph name should be really easy so it'd be nice to just get that done and not have anyone wandering where ethane comes from :) Change digraph name Regarding licensing, > BigPetStore: A polyglot big data processing blueprint > ----------------------------------------------------- > > Key: BIGTOP-1089 > URL: https://issues.apache.org/jira/browse/BIGTOP-1089 > Project: Bigtop > Issue Type: New Feature > Components: Blueprints > Affects Versions: 0.7.0 > Reporter: jay vyas > Assignee: jay vyas > Fix For: 0.8.0 > > Attachments: BIGTOP-1089.patch, BIGTOP-1089.patch, > BIGTOP-1089.pom.patch > > > The need for templates for processing big data pipelines is obvious - and > also - given the increasing amount of overlap across different big data and > nosql projects, it will provide a ground truth in the future for comparing > the behaviour and approach of different tools to solve a common, easily > comprehended problem. > This ticket formalizes the conversation in mailing list archives regarding > the BigPetStore proposal. > At the moment, (with the exception of word count), there are very few > examples of bigdata problems that have been solved by a variety of different > technologies. And, even with wordcount, there arent alot of templates which > can be customized for applications. > Comparatively: Other application developer communities (i.e.the Rails folks, > those using maven archetypes, etc.. ) have a plethora of template > applications which can be used to kickstart their applications and use cases. > > This big pet store JIRA thus aims to do the following: > 0) Curate a single, central, standard input data set . (modified: generating > a large input data set on the fly). > 1) Define a big data processing pipeline (using the pet store theme - except > morphing it to be analytics rather than transaction oriented), and implement > basic aggregations in hive, pig, etc... > 2) Sink the results of 2 into some kind of NoSQL store or search engine. > > Some implementation details -- open to change these, please comment/review -- > . > - initial data source will be raw text or (better yet) some kind of > automatically generated data. > - the source will initially go in bigtop/blueprints > - the application sources can be in any modern JVM language > (java,scala,groovy,clojure), since bigtop supports scala, java, groovy > natively already and clojure is easy to support with the right jars. > - each "job" will be named according to the corresponding DAG of the big data > pipeline . > - all jobs should (not sure if requirement?) be controlled by a global > program (maybe oozie?) which runs the tasks in order, and can easily be > customized to use different tools at different stages. > - for now, all outputs will be to files: so that users don't require servers > to run the app. > - final data sinks will be into a highly available transaction oriented store > (solr/hbase/...) > This ticket will be completed once a first iteration of BigPetStore is > complete using 3 ecosystem components, along with a depiction of the pipeline > which can be used for development. > I've assigned this to myself :) I hope thats okay? Seems like at the moment > im the only one working on it. -- This message was sent by Atlassian JIRA (v6.2#6252)