[
https://issues.apache.org/jira/browse/BIGTOP-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217377#comment-14217377
]
jay vyas edited comment on BIGTOP-1366 at 11/19/14 4:11 AM:
------------------------------------------------------------
Great work RJ ! ran {{gradle test}} and indeed,
- it builds and runs the data gen unit tests *next step* ill wait for others to
chime in, as far as i can tell, this is +1, and we now have
- the original mapreduce code works as well, so the pom.xml stuff is fixed.
now we have a powerfull, *spark* based data generator for bps.
- There is one last step - we need to (1) move arch.dot one level up and (2)
update the arch.dot file with the description of the new architecture/ You can
easily do that with graphviz (paste contentes of arch.dot into erdos and edit).
Just create a jira for that please, assign to yourself.
ill commit this tomorrow unless others have any issues.
was (Author: jayunit100):
Great work RJ ! ran {{gradle test}} and indeed,
- it builds and runs the data gen unit tests *next step* ill wait for others to
chime in, as far as i can tell, this is +1, and we now have
- the original mapreduce code works as well, so the pom.xml stuff is fixed.
now we have a powerfull, *spark* based data generator for bps.
- There is one last step - we need to ujpdate the arch.dot file with the
description of the new architecture/ You can easily do that with graphviz
(paste contentes of arch.dot into erdos and edit).
ill commit this tomorrow unless others have any issues.
> Updated, Richer Model for Generating Data for BigPetStore
> ----------------------------------------------------------
>
> Key: BIGTOP-1366
> URL: https://issues.apache.org/jira/browse/BIGTOP-1366
> Project: Bigtop
> Issue Type: Improvement
> Components: blueprints
> Affects Versions: backlog
> Reporter: RJ Nowling
> Assignee: RJ Nowling
> Priority: Minor
> Labels: bigpetstore
> Fix For: 0.9.0
>
> Attachments: BIGTOP-1366.patch
>
> Original Estimate: 8,736h
> Remaining Estimate: 8,736h
>
> BigPetStore uses synthetic data as the basis for its workflow. BPS's current
> model for generating customer data is sufficient for basic testing of the
> Hadoop ecosystem, **but the model is very basic and lacks sufficient
> complexity for embedding interesting patterns into the data**.
> As a result, **more complex, scalable testing such as testing clustering
> algorithms in Mahout on non-trivial data or multidimensional data with
> factors influencing it** is not currently possible.
> Efforts are currently underway to incrementally improve the current model
> (see BIGTOP-1271 and BIGTOP-1272).
> To create a model that can that incorporate **realistic, non-hierarchichal
> patterns** and input data to generate rich customer/transaction data with
> interesting correlations will require a re-imagining of the current model and
> its framework.
> To support the improvements to the model in BigPetStore, I have been working
> on an **alternative ab initio model, developed from scratch**. Since the
> development of a new model involves substantial R&D work with more
> specialized tools (mathematical and plotting libraries), I'm doing the
> current work outside of BPS using the iPython Notebook environment. Due to
> the long time frame, the model will be developed on a separate timeline to
> prevent slowing the development of BPS.
> Once the model has stabilized, I will begin incorporating the model into BPS
> itself. One option is to implement the model in using Scala for clean
> integration with **spark** which is likely to play an increasingly important
> role in the hadoop ecosystem, and thus will be an important part of
> bigpetstore as a test/blueprint app.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)