[ 
https://issues.apache.org/jira/browse/BIGTOP-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14122892#comment-14122892
 ] 

RJ Nowling edited comment on BIGTOP-1366 at 9/7/14 9:23 PM:
------------------------------------------------------------

Right now, the Java port is about half done.  Only supports generating stores 
and customers.  We need to add support for generating the transactions.

After many discussions [~jayunit100] and I have realized that it may be 
preferable to have two implementations of the data generator, a Python sandbox 
for me to prototype ideas and a JVM-based, stable implementation for external 
users. As new ideas are proven successful in the Python sandbox, they will be 
migrated to the JVM port.  I'll help maintain the JVM port.

I've been refactoring and adding unit tests and documentation to the Python 
implementation for a v0.2 release.  Once complete, v0.2 will be the basis for 
finishing the JVM port. 

The JVM port is currently using Java.  I am, however, also open to using 
Clojure.  Scala is another option but less preferable for me.  

[~jayunit100], what is the current status of Clojure support / interest in 
BigTop?  Will BigTop accept Clojure code?



was (Author: rnowling):
Right now, it's about half done.  Only supports generating stores and customers.

> Updated, Richer Model for Generating Data for BigPetStore 
> ----------------------------------------------------------
>
>                 Key: BIGTOP-1366
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-1366
>             Project: Bigtop
>          Issue Type: Improvement
>          Components: blueprints
>    Affects Versions: backlog
>            Reporter: RJ Nowling
>            Assignee: RJ Nowling
>            Priority: Minor
>   Original Estimate: 8,736h
>  Remaining Estimate: 8,736h
>
> BigPetStore uses synthetic data as the basis for its workflow.  BPS's current 
> model for generating customer data is sufficient for basic testing of the 
> Hadoop ecosystem, **but the model is very basic and lacks sufficient 
> complexity for embedding interesting patterns into the data**.  
> As a result, **more complex, scalable testing such as testing clustering 
> algorithms in Mahout on non-trivial data or multidimensional data with 
> factors influencing it** is not currently possible.
> Efforts are currently underway to incrementally improve the current model 
> (see BIGTOP-1271 and BIGTOP-1272).  
> To create a model that can that incorporate **realistic, non-hierarchichal 
> patterns** and input data to generate rich customer/transaction data with 
> interesting correlations will require a re-imagining of the current model and 
> its framework.
> To support the improvements to the model in BigPetStore, I have been working 
> on an **alternative ab initio model, developed from scratch**. Since the 
> development of a new model involves substantial R&D work with more 
> specialized tools (mathematical and plotting libraries), I'm doing the 
> current work outside of BPS using the iPython Notebook environment.  Due to 
> the long time frame, the model will be developed on a separate timeline to 
> prevent slowing the development of BPS.  
> Once the model has stabilized, I will begin incorporating the model into BPS 
> itself.  One option is to implement the model in using Scala for clean 
> integration with **spark** which is likely to play an increasingly important 
> role in the hadoop ecosystem, and thus will be an important part of 
> bigpetstore as a test/blueprint app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to