Hi All,

I tried to write down the use cases, to start thinking about this starting
from what we discussed in the meeting. Please comment. ( doc is at
https://docs.google.com/document/d/1355YEXbhcd2fvS-zG_CiMigT-iTncxYn3DTHlJRTYyo/edit#
( same content is below).

Thanks
Srinath
Batch, interactive, and Predictive Story

   1.

   Data is uploaded to the system or send as a data stream and collected
   for some time ( in DAS)
   2.

   Data Scientist come in and select a data set, and look at schema of data
   and do standard descriptive statistics like Mean, Max, Percentiles and
   standard deviation about the data.
   3.

   Data Scientist cleans up the data using series of transformations. This
   might include combining multiple data sets into one data set.  [Notebooks]
   4.

   He can play with the data interactively
   5.

   He visualize the data in several ways [Notebooks]
   6.

   If he need descriptive statistics, he can export the data mutations in
   the notebooks as a script and schedule it.
   7.

   If what he needs is machine learning, he can initialize and run the ML
   Wizard from the Notebooks and create a model.
   8.

   He can export the model he created and any data mutation operations he
   did as a script and deploy both the model and data mutation operations in
   the CEP ( Realtime Pipeline). This is the actual transaction flow.
   9.

   He can export the data mutation operations and machine learning model
   building logic as a script and schedule it to run periodically. This is the



[image: NotebookPipeline.png]



Realtime Story

Realtime story also we can start with a data set, write realtime queries,
test them by replaying the data, and then only we deploy queries. ( We do
this event now). We can do the same.


   1.

   User start with a dataset.
   2.

   He write a set of queries using dataset as a stream. Streams and dataset
   shares the same record format. For example, consider the following data
   set.


We can consider this as a batch data set by taking it as a whole or as a
stream by taking record by record.

For example, if we run query

select * from CountryData where GDP>35000

it will provide following results.




   1.

   Tables created by replay data with CEP queries, we can visualize like
   other data. ( except that time is special)
   2.

   When Data Scientist is happy, Data Scientist can click a button and
   export the CEP queries as a execution plan and any charts as a realtime
   gadgets. ( one complication is time is special, and we need to transform
   from any visualization to time based visualization)


-- 
============================
Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
Site: http://people.apache.org/~hemapani/
Photos: http://www.flickr.com/photos/hemapani/
Phone: 0772360902
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to