[
https://issues.apache.org/jira/browse/BIGTOP-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jake Farrell updated BIGTOP-1366:
---------------------------------
Assignee: jay vyas (was: RJ Nowling)
> Updated, Richer Model for Generating Data for BigPetStore
> ----------------------------------------------------------
>
> Key: BIGTOP-1366
> URL: https://issues.apache.org/jira/browse/BIGTOP-1366
> Project: Bigtop
> Issue Type: Improvement
> Components: Blueprints
> Affects Versions: backlog
> Reporter: RJ Nowling
> Assignee: jay vyas
> Priority: Minor
> Original Estimate: 8,736h
> Remaining Estimate: 8,736h
>
> BigPetStore uses synthetic data as the basis for its workflow. BPS's current
> model for generating customer data is sufficient for basic testing of the
> Hadoop ecosystem, but the model is very basic and lacks sufficient complexity
> for embedding interesting patterns into the data. As a result, more complex
> testing such as testing clustering algorithms in Mahout on non-trivial data
> is not currently possible.
> Efforts are currently underway to incrementally improve the current model
> (see BIGTOP-1271 and BIGTOP-1272). However, to create a model that can that
> incorporate realistic patterns and input data to generate rich
> customer/transaction data with interesting correlations will require a
> re-imagining of the current model and its framework.
> To support the improvements to the model in BigPetStore, I have been working
> on an alternative ab initio model, developed from scratch. Since the
> development of a new model involves substantial R&D work with more
> specialized tools (mathematical and plotting libraries), I'm doing the
> current work outside of BPS using the iPython Notebook environment. Due to
> the long time frame, the model will be developed on a separate timeline to
> prevent slowing the development of BPS.
> Once the model has stabilized, I will begin incorporating the model into BPS
> itself. One option is to implement the model in Spark using Scala as a
> foundation for Spark support in BPS.
--
This message was sent by Atlassian JIRA
(v6.2#6252)