[
https://issues.apache.org/jira/browse/BIGTOP-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
RJ Nowling updated BIGTOP-1535:
-------------------------------
Attachment: bps_spark_etl.patch
This patch:
* Adds case classes for a normalized, structured data model
* Adds I/O utility methods for reading and writing the structure data model
* Adds a Spark ETL component which parses the dirty CSV from the generator,
normalizes the data, and writes it out in the form of the structured data model
* Adds tests for all of the above
* Updates the README to discuss the data model and new component
* Adds a GraphViz workflow diagram for current and future components
[~jayunit100] I decided to create a separate arch diagram for now. I suggest
we create a separate JIRA to merge them since discussion may be in order.
Also, I didn't fix trailing whitespace -- can you handle that on commit?
> Add Spark ETL script to BigPetStore
> -----------------------------------
>
> Key: BIGTOP-1535
> URL: https://issues.apache.org/jira/browse/BIGTOP-1535
> Project: Bigtop
> Issue Type: Improvement
> Components: blueprints
> Reporter: RJ Nowling
> Assignee: RJ Nowling
> Attachments: bps_spark_etl.patch
>
>
> We should add script that reads the results from the data generator and
> normalizes the data and splits it into separate tables (ETL). It would be
> nice to use Spark SQL but it is not required.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)