Re: [Architecture] Notebook Support Use cases for DAS

Anjana Fernando Tue, 08 Dec 2015 00:01:59 -0800

Hi Srinath,

I'm afraid, we couldn't do any work on this yet, because at the moment,
everyone is occupied on working for the DAS 3.0.1 release and the Log
Analyzer work. I just had a chat with Miyuru, he mentioned he is checking
CEP specific functionality for notebooks. I guess, the batch analytics
integration with the notebook approach is somewhat straightforward, where
what we basically have in Spark Console is a subset of that approach. So
according to the current plan we made for next year, we planned on checking
that for DAS 3.1.0, with the change to C5, where we would be changing all
the UIs, which would be removing all the current functionality from the
admin console and unifying the UIs. So in that effort, we can integrate
this aspect too. Miyuru suggested that we'll have a quick chat on Friday,
let's talk more then.


Cheers,
Anjana.

On Tue, Dec 8, 2015 at 9:18 AM, Srinath Perera <[email protected]> wrote:

> Anjana, how is this thread progressing? Who is looking at/ thinking about
> notebooks?
>
> On Thu, Nov 26, 2015 at 9:19 AM, Anjana Fernando <[email protected]> wrote:
>
>> Hi Srinath,
>>
>> On Thu, Nov 26, 2015 at 9:08 AM, Srinath Perera <[email protected]> wrote:
>>
>>> Hi Anjana,
>>>
>>> Great!! I think the next step is deciding whether we do this with
>>> Zeppelin and or we build it from scratch.
>>>
>>> Pros of Zeppelin
>>>
>>>    1. We get lot of features OOB
>>>    2. Code maintained by community, patches etc.
>>>    3. New features will get added and it will evolve
>>>    4. We get to contribute to an Apache project and build recognition
>>>
>>> Cons
>>>
>>>    1. Real deep integration might be lot of work ( we get initial
>>>    version very fast, but integrating details .. e.g. make our UIs work
>>>    in Zeppelin, or get Zeppelin to post to UES) might be tricky.
>>>    2. Zeppelin is still in incubator
>>>    3. Need to assess community
>>>
>>> I suggest you guys have a detailed chat with MiyuruD, who looked at it
>>> in detail, try out things, thing about it and report back.
>>>
>>
>> +1, we'll work with Miyuru also and see how to go forward.
>>
>>
>>>
>>>
>>> On Thu, Nov 26, 2015 at 3:12 AM, Anjana Fernando <[email protected]>
>>> wrote:
>>>
>>>> Hi Srinath,
>>>>
>>>> The story looks good. For that part about, the "user can play with the
>>>> data interactively", to make it more functional, we should probably
>>>> consider integration of Scala scripts to the mix, rather than only having
>>>> Spark SQL. Spark SQL maybe limited in functionality on certain data
>>>> operations, and with Scala, we should be able to use all the functionality
>>>> of Spark. For example, it would be easier to integrate ML operations with
>>>> other batch operations etc.. to create a more natural flow of operations.
>>>> The implementation may be tricky though, considering clustering,
>>>> multi-tenancy etc..
>>>>
>>> Lets keep Scala version post MVP.
>>>
>>
>> Sure.
>>
>>
>>>
>>>
>>>>
>>>> Also, I would like to also bring up the question on, are most batch
>>>> jobs actually meant to be scheduled as such repeatedly, for a data set that
>>>> actually grows always? .. or is it mostly a thing where we execute
>>>> something once and get the results and that's it. Maybe this is a different
>>>> discussion though. But, for scheduled batch jobs as such, I guess
>>>> incremental processing would be critical, which no one seems to bother that
>>>> much though.
>>>>
>>> I think it is mostly scheduled batches as we have. Shall we take this up
>>> in a different thread?
>>>
>>
>> Yep, sure.
>>
>>
>>>
>>>
>>>>
>>>> Cheers,
>>>> Anjana.
>>>>
>>>> On Mon, Nov 23, 2015 at 2:57 PM, Srinath Perera <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I tried to write down the use cases, to start thinking about this
>>>>> starting from what we discussed in the meeting. Please comment. ( doc is 
>>>>> at
>>>>> https://docs.google.com/document/d/1355YEXbhcd2fvS-zG_CiMigT-iTncxYn3DTHlJRTYyo/edit#
>>>>> ( same content is below).
>>>>>
>>>>> Thanks
>>>>> Srinath
>>>>> Batch, interactive, and Predictive Story
>>>>>
>>>>>    1.
>>>>>
>>>>>    Data is uploaded to the system or send as a data stream and
>>>>>    collected for some time ( in DAS)
>>>>>    2.
>>>>>
>>>>>    Data Scientist come in and select a data set, and look at schema
>>>>>    of data and do standard descriptive statistics like Mean, Max, 
>>>>> Percentiles
>>>>>    and standard deviation about the data.
>>>>>    3.
>>>>>
>>>>>    Data Scientist cleans up the data using series of transformations.
>>>>>    This might include combining multiple data sets into one data set.
>>>>>     [Notebooks]
>>>>>    4.
>>>>>
>>>>>    He can play with the data interactively
>>>>>    5.
>>>>>
>>>>>    He visualize the data in several ways [Notebooks]
>>>>>    6.
>>>>>
>>>>>    If he need descriptive statistics, he can export the data
>>>>>    mutations in the notebooks as a script and schedule it.
>>>>>    7.
>>>>>
>>>>>    If what he needs is machine learning, he can initialize and run
>>>>>    the ML Wizard from the Notebooks and create a model.
>>>>>    8.
>>>>>
>>>>>    He can export the model he created and any data mutation
>>>>>    operations he did as a script and deploy both the model and data 
>>>>> mutation
>>>>>    operations in the CEP ( Realtime Pipeline). This is the actual 
>>>>> transaction
>>>>>    flow.
>>>>>    9.
>>>>>
>>>>>    He can export the data mutation operations and machine learning
>>>>>    model building logic as a script and schedule it to run periodically. 
>>>>> This
>>>>>    is the
>>>>>
>>>>>
>>>>>
>>>>> [image: NotebookPipeline.png]
>>>>>
>>>>>
>>>>>
>>>>> Realtime Story
>>>>>
>>>>> Realtime story also we can start with a data set, write realtime
>>>>> queries, test them by replaying the data, and then only we deploy queries.
>>>>> ( We do this event now). We can do the same.
>>>>>
>>>>>
>>>>>    1.
>>>>>
>>>>>    User start with a dataset.
>>>>>    2.
>>>>>
>>>>>    He write a set of queries using dataset as a stream. Streams and
>>>>>    dataset shares the same record format. For example, consider the 
>>>>> following
>>>>>    data set.
>>>>>
>>>>>
>>>>> We can consider this as a batch data set by taking it as a whole or as
>>>>> a stream by taking record by record.
>>>>>
>>>>> For example, if we run query
>>>>>
>>>>> select * from CountryData where GDP>35000
>>>>>
>>>>> it will provide following results.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>    1.
>>>>>
>>>>>    Tables created by replay data with CEP queries, we can visualize
>>>>>    like other data. ( except that time is special)
>>>>>    2.
>>>>>
>>>>>    When Data Scientist is happy, Data Scientist can click a button
>>>>>    and export the CEP queries as a execution plan and any charts as a 
>>>>> realtime
>>>>>    gadgets. ( one complication is time is special, and we need to 
>>>>> transform
>>>>>    from any visualization to time based visualization)
>>>>>
>>>>>
>>>>> --
>>>>> ============================
>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>>>> Site: http://people.apache.org/~hemapani/
>>>>> Photos: http://www.flickr.com/photos/hemapani/
>>>>> Phone: 0772360902
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> *Anjana Fernando*
>>>> Senior Technical Lead
>>>> WSO2 Inc. | http://wso2.com
>>>> lean . enterprise . middleware
>>>>
>>>
>>>
>>>
>>> --
>>> ============================
>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
>>> Site: http://people.apache.org/~hemapani/
>>> Photos: http://www.flickr.com/photos/hemapani/
>>> Phone: 0772360902
>>>
>>
>>
>>
>> --
>> *Anjana Fernando*
>> Senior Technical Lead
>> WSO2 Inc. | http://wso2.com
>> lean . enterprise . middleware
>>
>
>
>
> --
> ============================
> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera
> Site: http://people.apache.org/~hemapani/
> Photos: http://www.flickr.com/photos/hemapani/
> Phone: 0772360902
>



-- 
*Anjana Fernando*
Senior Technical Lead
WSO2 Inc. | http://wso2.com
lean . enterprise . middleware

_______________________________________________
Architecture mailing list
[email protected]
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Re: [Architecture] Notebook Support Use cases for DAS

Reply via email to