[ 
https://issues.apache.org/jira/browse/BIGTOP-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

RJ Nowling updated BIGTOP-1535:
-------------------------------
    Attachment: bps_spark_etl.patch

This patch:

* Adds case classes for a normalized, structured data model
* Adds I/O utility methods for reading and writing the structure data model
* Adds a Spark ETL component which parses the dirty CSV from the generator, 
normalizes the data, and writes it out in the form of the structured data model
* Adds tests for all of the above
* Updates the README to discuss the data model and new component
* Adds a GraphViz workflow diagram for current and future components

[~jayunit100] I decided to create a separate arch diagram for now.  I suggest 
we create a separate JIRA to merge them since discussion may be in order.  
Also, I didn't fix trailing whitespace -- can you handle that on commit?

> Add Spark ETL script to BigPetStore
> -----------------------------------
>
>                 Key: BIGTOP-1535
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-1535
>             Project: Bigtop
>          Issue Type: Improvement
>          Components: blueprints
>            Reporter: RJ Nowling
>            Assignee: RJ Nowling
>         Attachments: bps_spark_etl.patch
>
>
> We should add script that reads the results from the data generator and 
> normalizes the data and splits it into separate tables (ETL).  It would be 
> nice to use Spark SQL but it is not required.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to