Hi Srinath,

The 'notebooks' that you talk about: are they similar to a sort of staging
DAS configuration where we test out/ try out things, and once we are happy
we deploy that configuration to the respective pipelines (like Realtime).

Cannot access doc. Can you share a 'view only' link?

seshi

On Mon, Nov 23, 2015 at 2:57 PM, Srinath Perera <[email protected]> wrote:

> Hi All,
>
> I tried to write down the use cases, to start thinking about this starting
> from what we discussed in the meeting. Please comment. ( doc is at
> https://docs.google.com/document/d/1355YEXbhcd2fvS-zG_CiMigT-iTncxYn3DTHlJRTYyo/edit#
> ( same content is below).
>
> Thanks
> Srinath
> Batch, interactive, and Predictive Story
>
>    1.
>
>    Data is uploaded to the system or send as a data stream and collected
>    for some time ( in DAS)
>    2.
>
>    Data Scientist come in and select a data set, and look at schema of
>    data and do standard descriptive statistics like Mean, Max, Percentiles and
>    standard deviation about the data.
>    3.
>
>    Data Scientist cleans up the data using series of transformations.
>    This might include combining multiple data sets into one data set.
>     [Notebooks]
>    4.
>
>    He can play with the data interactively
>    5.
>
>    He visualize the data in several ways [Notebooks]
>    6.
>
>    If he need descriptive statistics, he can export the data mutations in
>    the notebooks as a script and schedule it.
>    7.
>
>    If what he needs is machine learning, he can initialize and run the ML
>    Wizard from the Notebooks and create a model.
>    8.
>
>    He can export the model he created and any data mutation operations he
>    did as a script and deploy both the model and data mutation operations in
>    the CEP ( Realtime Pipeline). This is the actual transaction flow.
>    9.
>
>    He can export the data mutation operations and machine learning model
>    building logic as a script and schedule it to run periodically. This is the
>
>
>
> [image: NotebookPipeline.png]
>
>
>
> Realtime Story
>
> Realtime story also we can start with a data set, write realtime queries,
> test them by replaying the data, and then only we deploy queries. ( We do
> this event now). We can do the same.
>
>
>    1.
>
>    User start with a dataset.
>    2.
>
>    He write a set of queries using dataset as a stream. Streams and
>    dataset shares the same record format. For example, consider the following
>    data set.
>
>
> We can consider this as a batch data set by taking it as a whole or as a
> stream by taking record by record.
>
> For example, if we run query
>
> select * from CountryData where GDP>35000
>
> it will provide following results.
>
>
>
>
>    1.
>
>    Tables created by replay data with CEP queries, we can visualize like
>    other data. ( except that time is special)
>    2.
>
>    When Data Scientist is happy, Data Scientist can click a button and
>    export the CEP queries as a execution plan and any charts as a realtime
>    gadgets. ( one complication is time is special, and we need to transform
>    from any visualization to time based visualization)
>
>
> --
> ============================
> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
> Site: http://people.apache.org/~hemapani/
> Photos: http://www.flickr.com/photos/hemapani/
> Phone: 0772360902
>
> _______________________________________________
> Architecture mailing list
> [email protected]
> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>
>
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to