Hi Srinath, The 'notebooks' that you talk about: are they similar to a sort of staging DAS configuration where we test out/ try out things, and once we are happy we deploy that configuration to the respective pipelines (like Realtime).
Cannot access doc. Can you share a 'view only' link? seshi On Mon, Nov 23, 2015 at 2:57 PM, Srinath Perera <[email protected]> wrote: > Hi All, > > I tried to write down the use cases, to start thinking about this starting > from what we discussed in the meeting. Please comment. ( doc is at > https://docs.google.com/document/d/1355YEXbhcd2fvS-zG_CiMigT-iTncxYn3DTHlJRTYyo/edit# > ( same content is below). > > Thanks > Srinath > Batch, interactive, and Predictive Story > > 1. > > Data is uploaded to the system or send as a data stream and collected > for some time ( in DAS) > 2. > > Data Scientist come in and select a data set, and look at schema of > data and do standard descriptive statistics like Mean, Max, Percentiles and > standard deviation about the data. > 3. > > Data Scientist cleans up the data using series of transformations. > This might include combining multiple data sets into one data set. > [Notebooks] > 4. > > He can play with the data interactively > 5. > > He visualize the data in several ways [Notebooks] > 6. > > If he need descriptive statistics, he can export the data mutations in > the notebooks as a script and schedule it. > 7. > > If what he needs is machine learning, he can initialize and run the ML > Wizard from the Notebooks and create a model. > 8. > > He can export the model he created and any data mutation operations he > did as a script and deploy both the model and data mutation operations in > the CEP ( Realtime Pipeline). This is the actual transaction flow. > 9. > > He can export the data mutation operations and machine learning model > building logic as a script and schedule it to run periodically. This is the > > > > [image: NotebookPipeline.png] > > > > Realtime Story > > Realtime story also we can start with a data set, write realtime queries, > test them by replaying the data, and then only we deploy queries. ( We do > this event now). We can do the same. > > > 1. > > User start with a dataset. > 2. > > He write a set of queries using dataset as a stream. Streams and > dataset shares the same record format. For example, consider the following > data set. > > > We can consider this as a batch data set by taking it as a whole or as a > stream by taking record by record. > > For example, if we run query > > select * from CountryData where GDP>35000 > > it will provide following results. > > > > > 1. > > Tables created by replay data with CEP queries, we can visualize like > other data. ( except that time is special) > 2. > > When Data Scientist is happy, Data Scientist can click a button and > export the CEP queries as a execution plan and any charts as a realtime > gadgets. ( one complication is time is special, and we need to transform > from any visualization to time based visualization) > > > -- > ============================ > Blog: http://srinathsview.blogspot.com twitter:@srinath_perera > Site: http://people.apache.org/~hemapani/ > Photos: http://www.flickr.com/photos/hemapani/ > Phone: 0772360902 > > _______________________________________________ > Architecture mailing list > [email protected] > https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture > >
_______________________________________________ Architecture mailing list [email protected] https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
