Hi All, I tried to write down the use cases, to start thinking about this starting from what we discussed in the meeting. Please comment. ( doc is at https://docs.google.com/document/d/1355YEXbhcd2fvS-zG_CiMigT-iTncxYn3DTHlJRTYyo/edit# ( same content is below).
Thanks Srinath Batch, interactive, and Predictive Story 1. Data is uploaded to the system or send as a data stream and collected for some time ( in DAS) 2. Data Scientist come in and select a data set, and look at schema of data and do standard descriptive statistics like Mean, Max, Percentiles and standard deviation about the data. 3. Data Scientist cleans up the data using series of transformations. This might include combining multiple data sets into one data set. [Notebooks] 4. He can play with the data interactively 5. He visualize the data in several ways [Notebooks] 6. If he need descriptive statistics, he can export the data mutations in the notebooks as a script and schedule it. 7. If what he needs is machine learning, he can initialize and run the ML Wizard from the Notebooks and create a model. 8. He can export the model he created and any data mutation operations he did as a script and deploy both the model and data mutation operations in the CEP ( Realtime Pipeline). This is the actual transaction flow. 9. He can export the data mutation operations and machine learning model building logic as a script and schedule it to run periodically. This is the [image: NotebookPipeline.png] Realtime Story Realtime story also we can start with a data set, write realtime queries, test them by replaying the data, and then only we deploy queries. ( We do this event now). We can do the same. 1. User start with a dataset. 2. He write a set of queries using dataset as a stream. Streams and dataset shares the same record format. For example, consider the following data set. We can consider this as a batch data set by taking it as a whole or as a stream by taking record by record. For example, if we run query select * from CountryData where GDP>35000 it will provide following results. 1. Tables created by replay data with CEP queries, we can visualize like other data. ( except that time is special) 2. When Data Scientist is happy, Data Scientist can click a button and export the CEP queries as a execution plan and any charts as a realtime gadgets. ( one complication is time is special, and we need to transform from any visualization to time based visualization) -- ============================ Blog: http://srinathsview.blogspot.com twitter:@srinath_perera Site: http://people.apache.org/~hemapani/ Photos: http://www.flickr.com/photos/hemapani/ Phone: 0772360902
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
