[
https://issues.apache.org/jira/browse/BIGTOP-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
RJ Nowling updated BIGTOP-1366:
-------------------------------
Description:
BigPetStore uses synthetic data as the basis for its workflow. BPS's current
model for generating customer data is sufficient for basic testing of the
Hadoop ecosystem, but the model is very basic and lacks sufficient complexity
for embedding interesting patterns into the data. As a result, more complex
testing such as testing clustering algorithms in Mahout on non-trivial data is
not currently possible.
Efforts are currently underway to incrementally improve the current model (see
BIGTOP-1271 and BIGTOP-1272). However, to create a model that can that
incorporates realistic patterns and input data to generate rich
customer/transaction data with interesting correlations will require a
re-imagining of the current model and its model's framework.
To support the improvements to the model in BigPetStore, I have been working on
an alternative ab initio model, developed from scratch. Since the development
of a new model involves substantial R&D work with more specialized tools
(mathematical and plotting libraries), I'm doing the current work outside of
BPS using the iPython Notebook environment. Due to the long time frame, the
model will be developed on a separate timeline to prevent slowing the
development of BPS.
Once the model has stabilized, I will begin incorporating the model into BPS
itself. One option is to implement the model in Spark using Scala as a
foundation for Spark support in BPS.
was:
BigPetStore uses synthetic data as the basis for its workflow. BPS's current
model for generating customer data is sufficient for basic testing of the
Hadoop ecosystem, but the model is very basic and lacks sufficient complexity
for embedding interesting patterns into the data. As a result, more complex
testing such as testing clustering algorithms in Mahout on non-trivial data are
not possible.
Efforts are currently underway to incrementally improve the current model (see
BIGTOP-1271 and BIGTOP-1272). However, to create a model that can that
incorporates realistic patterns and input data to generate rich
customer/transaction data with interesting correlations will require a
re-imagining of the current model and its model's framework.
To support the improvements to the model in BigPetStore, I have been working on
an alternative ab initio model, developed from scratch. Since the development
of a new model involves substantial R&D work with more specialized tools
(mathematical and plotting libraries), I'm doing the current work outside of
BPS using the iPython Notebook environment. Due to the long time frame, the
model will be developed on a separate timeline to prevent slowing the
development of BPS.
Once the model has stabilized, I will begin incorporating the model into BPS
itself. One option is to implement the model in Spark using Scala as a
foundation for Spark support in BPS.
> Updated, Richer Model for Generating Data for BigPetSore
> ---------------------------------------------------------
>
> Key: BIGTOP-1366
> URL: https://issues.apache.org/jira/browse/BIGTOP-1366
> Project: Bigtop
> Issue Type: Improvement
> Affects Versions: backlog
> Reporter: RJ Nowling
> Priority: Minor
> Original Estimate: 8,736h
> Remaining Estimate: 8,736h
>
> BigPetStore uses synthetic data as the basis for its workflow. BPS's current
> model for generating customer data is sufficient for basic testing of the
> Hadoop ecosystem, but the model is very basic and lacks sufficient complexity
> for embedding interesting patterns into the data. As a result, more complex
> testing such as testing clustering algorithms in Mahout on non-trivial data
> is not currently possible.
> Efforts are currently underway to incrementally improve the current model
> (see BIGTOP-1271 and BIGTOP-1272). However, to create a model that can that
> incorporates realistic patterns and input data to generate rich
> customer/transaction data with interesting correlations will require a
> re-imagining of the current model and its model's framework.
> To support the improvements to the model in BigPetStore, I have been working
> on an alternative ab initio model, developed from scratch. Since the
> development of a new model involves substantial R&D work with more
> specialized tools (mathematical and plotting libraries), I'm doing the
> current work outside of BPS using the iPython Notebook environment. Due to
> the long time frame, the model will be developed on a separate timeline to
> prevent slowing the development of BPS.
> Once the model has stabilized, I will begin incorporating the model into BPS
> itself. One option is to implement the model in Spark using Scala as a
> foundation for Spark support in BPS.
--
This message was sent by Atlassian JIRA
(v6.2#6252)