yes that is the main idea, please see architecture thread "Databricks
Notebooks for Spark" and also read about iphython notebooks.

--Srinath

On Mon, Nov 23, 2015 at 5:07 PM, Seshika Fernando <[email protected]> wrote:

> Hi Srinath,
>
> The 'notebooks' that you talk about: are they similar to a sort of staging
> DAS configuration where we test out/ try out things, and once we are happy
> we deploy that configuration to the respective pipelines (like Realtime).
>
> Cannot access doc. Can you share a 'view only' link?
>
> seshi
>
> On Mon, Nov 23, 2015 at 2:57 PM, Srinath Perera <[email protected]> wrote:
>
>> Hi All,
>>
>> I tried to write down the use cases, to start thinking about this
>> starting from what we discussed in the meeting. Please comment. ( doc is at
>> https://docs.google.com/document/d/1355YEXbhcd2fvS-zG_CiMigT-iTncxYn3DTHlJRTYyo/edit#
>> ( same content is below).
>>
>> Thanks
>> Srinath
>> Batch, interactive, and Predictive Story
>>
>>    1.
>>
>>    Data is uploaded to the system or send as a data stream and collected
>>    for some time ( in DAS)
>>    2.
>>
>>    Data Scientist come in and select a data set, and look at schema of
>>    data and do standard descriptive statistics like Mean, Max, Percentiles 
>> and
>>    standard deviation about the data.
>>    3.
>>
>>    Data Scientist cleans up the data using series of transformations.
>>    This might include combining multiple data sets into one data set.
>>     [Notebooks]
>>    4.
>>
>>    He can play with the data interactively
>>    5.
>>
>>    He visualize the data in several ways [Notebooks]
>>    6.
>>
>>    If he need descriptive statistics, he can export the data mutations
>>    in the notebooks as a script and schedule it.
>>    7.
>>
>>    If what he needs is machine learning, he can initialize and run the
>>    ML Wizard from the Notebooks and create a model.
>>    8.
>>
>>    He can export the model he created and any data mutation operations
>>    he did as a script and deploy both the model and data mutation operations
>>    in the CEP ( Realtime Pipeline). This is the actual transaction flow.
>>    9.
>>
>>    He can export the data mutation operations and machine learning model
>>    building logic as a script and schedule it to run periodically. This is 
>> the
>>
>>
>>
>> [image: NotebookPipeline.png]
>>
>>
>>
>> Realtime Story
>>
>> Realtime story also we can start with a data set, write realtime queries,
>> test them by replaying the data, and then only we deploy queries. ( We do
>> this event now). We can do the same.
>>
>>
>>    1.
>>
>>    User start with a dataset.
>>    2.
>>
>>    He write a set of queries using dataset as a stream. Streams and
>>    dataset shares the same record format. For example, consider the following
>>    data set.
>>
>>
>> We can consider this as a batch data set by taking it as a whole or as a
>> stream by taking record by record.
>>
>> For example, if we run query
>>
>> select * from CountryData where GDP>35000
>>
>> it will provide following results.
>>
>>
>>
>>
>>    1.
>>
>>    Tables created by replay data with CEP queries, we can visualize like
>>    other data. ( except that time is special)
>>    2.
>>
>>    When Data Scientist is happy, Data Scientist can click a button and
>>    export the CEP queries as a execution plan and any charts as a realtime
>>    gadgets. ( one complication is time is special, and we need to transform
>>    from any visualization to time based visualization)
>>
>>
>> --
>> ============================
>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>> Site: http://people.apache.org/~hemapani/
>> Photos: http://www.flickr.com/photos/hemapani/
>> Phone: 0772360902
>>
>> _______________________________________________
>> Architecture mailing list
>> [email protected]
>> https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture
>>
>>
>


-- 
============================
Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
Site: http://people.apache.org/~hemapani/
Photos: http://www.flickr.com/photos/hemapani/
Phone: 0772360902
_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to