Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]

Maheshakya Wijewardena Sun, 13 Mar 2016 05:47:36 -0700

Hi Mahesh,

You don't have to look into carbon-ml.


Best regards.

On Sun, Mar 13, 2016 at 5:49 PM, Mahesh Dananjaya <[email protected]
> wrote:

> Hi maheshakya,
> i am working on some examples related to Spark and ML.is there anything to
> do with carbon-ml. I think i dont need to look into that one.do i?
> BR,
> Mahesh
>
> On Tue, Mar 8, 2016 at 11:55 AM, Maheshakya Wijewardena <
> [email protected]> wrote:
>
>> Hi Mahesh,
>>
>> does that Scala API is with your current product or repo?
>>
>>
>> No, we don't have the Scala API included. What we want is to design the
>> Java implementations of those algorithms to train with mini-batches of
>> streaming data with the help of the aforementioned methods so that we can
>> include in as a CEP extension.
>>
>> As to clarify, please try to write a simple Java program using Spark
>> MLLib linear regression and k-means clustering with a sample data set (You
>> can find alot of data sets from UCI repo[1]).  You need to break the
>> dataset into several pieces and train a model repeatedly with those.
>> After each training run, save the model information (such as weights,
>> intercepts for regression and cluster centers for clustering - please check
>> the arguments of those methods I have mentioned and save the required
>> information of the model)
>> When training a model we a new piece of data, use those methods to
>> initialize and put the save values for the arguments. This way you can
>> start from where you stopped in the previous run.
>>
>> Let us know your observations and feel free to ask if you need to know
>> anything more on this.
>>
>> We'll let you know what needs to be done to include this in CEP.
>>
>> Best regards.
>>
>> On Tue, Mar 8, 2016 at 10:59 AM, Mahesh Dananjaya <
>> [email protected]> wrote:
>>
>>> Hi Maheshakya,
>>> great.thank you.i already have ML and CEP and working more towards it.
>>> does that Scala API is with your current product or repo?.  thank you.
>>> BR,
>>> Mahesh.
>>>
>>> On Sun, Mar 6, 2016 at 5:49 PM, Maheshakya Wijewardena <
>>> [email protected]> wrote:
>>>
>>>> Hi Mahesh,
>>>>
>>>> Please find the comments inline.
>>>>
>>>> does data stream is taken to ML as the event publisher's format through
>>>>> event publisher. Or  we can use direct traffic that comes to event
>>>>> receiver, or else as streams
>>>>>
>>>> We intend to use the direct data as even streams.
>>>>
>>>> 1.) Those data coming from wso2 DAS to ML are coming as streams?
>>>>>
>>>> No, WSO2 ML doesn't use any even stream. The data stored in tables in
>>>> DAS is loaded into ML.
>>>>
>>>> 2.) Are there any incremental learning algorithms currently active in
>>>>> ML?you mentioned that there are and they are with scala API. So there is a
>>>>> streaming support with that Scala API. In that API which format the data 
>>>>> is
>>>>> aquired to ML?
>>>>>
>>>> No, there are no incremental learning algorithms in ML. The scala API
>>>> is about Spark MLLib. MLLib supports streaming k-means and other
>>>> generalized linear models (linear regression variants and logistic
>>>> regression) with Scala API. What they basically do in those implementations
>>>> is retraining the trained models with mini batches when data sequentially
>>>> arrives. There, the breaking of streaming data into mini batches is done
>>>> with the help of Spark Streaming. But we do not intend to use Spark
>>>> streaming in our implementation. What we need to do is implement a similar
>>>> behavior for event streams using the Java API.  The Java API has the
>>>> following methods:
>>>>
>>>>    - *createModel
>>>>    
>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html#createModel%28org.apache.spark.mllib.linalg.Vector,%20double%29>*
>>>>    (Vector
>>>>    
>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/linalg/Vector.html>
>>>>  weights,
>>>>    double intercept) - for GLMs
>>>>    - *setInitialModel
>>>>    
>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeans.html#setInitialModel%28org.apache.spark.mllib.clustering.KMeansModel%29>*
>>>>    (KMeansModel
>>>>    
>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeansModel.html>
>>>>  model)
>>>>    - for K means
>>>>
>>>> With the help of these methods, we can train models again with newly
>>>> arriving data, keeping the characteristics learned with the previous data.
>>>> When implementing this, we need to pay attention to other parameters of
>>>> incremental learning such as data horizon and data obsolescence (indicated
>>>> in the project ideas page).
>>>> We need to discuss on how to add these with CEP event streams. I have
>>>> added Suho into the thread for more clarification.
>>>>
>>>> Best regards.
>>>>
>>>>
>>>> On Sat, Mar 5, 2016 at 5:15 PM, Mahesh Dananjaya <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi maheshakya,
>>>>> as we concerned to use WSO2 CEP to handle streaming data and implement
>>>>> the machine learning algorithms with Spark MLLib, does data stream is 
>>>>> taken
>>>>> to ML as the event publisher's format through event publisher. Or  we can
>>>>> use direct traffic that comes to event receiver, or else as streams.
>>>>> referring to https://docs.wso2.com/display/CEP410/User+Guide
>>>>>     1.) Those data coming from wso2 DAS to ML are coming as streams?
>>>>>     2.) Are there any incremental learning algorithms currently active
>>>>> in ML?you mentioned that there are and they are with scala API. So there 
>>>>> is
>>>>> a streaming support with that Scala API. In that API which format the data
>>>>> is aquired to ML?
>>>>>
>>>>> thank you.
>>>>> BR,
>>>>> Mahesh.
>>>>>
>>>>> On Fri, Mar 4, 2016 at 2:03 PM, Maheshakya Wijewardena <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi Mahesh,
>>>>>>
>>>>>> We had to modify a the project scope a little to suit best for the
>>>>>> requirements. We will update the project idea with those concerns soon 
>>>>>> and
>>>>>> let you know.
>>>>>>
>>>>>> We do not support streaming data in WSO2 Machine learner at the
>>>>>> moment. The new concern is to use WSO2 CEP to handle streaming data and
>>>>>> implement the machine learning algorithms with Spark MLLib. You can look 
>>>>>> at
>>>>>> the streaming k-means and streaming linear regression implementations in
>>>>>> MLLib. Currently, the API is only for scala. Our need is to get the Java
>>>>>> APIs of k-means and generalized linear models to support incremental
>>>>>> learning with streaming data. This has to be done as mini-batch learning
>>>>>> since these algorithms operates as stochastic gradient descents so that 
>>>>>> any
>>>>>> learning with new data can be done on top of the previously learned 
>>>>>> models.
>>>>>> So please go through the those APIs[1][2][3] and try to get an idea.
>>>>>> Also please try to understand how event streams work in WSO2 CEP
>>>>>> [4][5].
>>>>>>
>>>>>> Best regards.
>>>>>>
>>>>>> [1]
>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html
>>>>>> [2]
>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeans.html
>>>>>> [3]
>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/classification/LogisticRegressionWithSGD.html
>>>>>> [4] https://docs.wso2.com/display/CEP310/Working+with+Event+Streams
>>>>>> [5] https://docs.wso2.com/display/CEP310/Working+with+Execution+Plans
>>>>>>
>>>>>> On Fri, Mar 4, 2016 at 11:26 AM, Mahesh Dananjaya <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi maheshakya,
>>>>>>> give me sometime to go through your ML package. Do current product
>>>>>>> have any stream data support?. i did some university projects related to
>>>>>>> machine learning with regressions,modelling, factor analysis, cluster
>>>>>>> analysis and classification problems (Discriminant Analysis) with SVM
>>>>>>> (Support Vector machines), Neural networks, LS classification and
>>>>>>> ML(Maximum likelihood). give me sometime to see how wso2 architecture
>>>>>>> works.then i can come up with good architecture.thank you.
>>>>>>> BR,
>>>>>>> Mahesh.
>>>>>>>
>>>>>>> On Wed, Mar 2, 2016 at 2:41 PM, Mahesh Dananjaya <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi Maheshakya,
>>>>>>>> Thank you for the resources. I will go through this and looking
>>>>>>>> forward to this proposed project.Thank you.
>>>>>>>> BR,
>>>>>>>> Mahesh.
>>>>>>>>
>>>>>>>> On Wed, Mar 2, 2016 at 1:52 PM, Maheshakya Wijewardena <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi Mahesh,
>>>>>>>>>
>>>>>>>>> Thank you for the interest for this project.
>>>>>>>>>
>>>>>>>>> We would like to know what type of similar projects you have
>>>>>>>>> worked on. You may have seen that WSO2 Machine Learner supports 
>>>>>>>>> several
>>>>>>>>> learning algorithms at the moment[1]. This project intends to 
>>>>>>>>> leverage the
>>>>>>>>> existing algorithms in WSO2 Machine Learner to support streaming 
>>>>>>>>> data. As
>>>>>>>>> an initiative, first you can get an idea about what WSO2 Machine 
>>>>>>>>> Learner
>>>>>>>>> does and how it operates. You can download WSO2 Machine Learner from
>>>>>>>>> product page[2] and the the source code [3]. ML is using Apache Spark
>>>>>>>>> MLLib[4] for its' algorithms so it's better to read and understand 
>>>>>>>>> what it
>>>>>>>>> does as well.
>>>>>>>>>
>>>>>>>>> In order to get an idea about the deliverables and the scope of
>>>>>>>>> this project, try to understand how Spark streaming[5] (see examples)
>>>>>>>>> handles streaming data. Also, have a look in the streaming 
>>>>>>>>> algorithms[6][7]
>>>>>>>>> supported by MLLib. There are two approaches discussed to employ
>>>>>>>>> incremental learning in ML in the project proposals page. These 
>>>>>>>>> streaming
>>>>>>>>> algorithms can be directly used in the first approach. For the other
>>>>>>>>> approach, the your implementation should contain a procedure to 
>>>>>>>>> create mini
>>>>>>>>> batches from streaming data with relevant sizes (i.e. a moving 
>>>>>>>>> window) and
>>>>>>>>> do periodic retraining of the same algorithm.
>>>>>>>>>
>>>>>>>>> To start with the project, you will need to come up with a
>>>>>>>>> suitable plan and an architecture first.
>>>>>>>>>
>>>>>>>>> Please watch the video referenced in the proposal (reference: 5).
>>>>>>>>> It will help you getting a better idea about machine learning 
>>>>>>>>> algorithms
>>>>>>>>> with streaming data.
>>>>>>>>>
>>>>>>>>> Let us know if you need any help with these.
>>>>>>>>>
>>>>>>>>> Best regards
>>>>>>>>>
>>>>>>>>> [1] https://docs.wso2.com/display/ML110/Machine+Learner+Algorithms
>>>>>>>>> [2] http://wso2.com/products/machine-learner/
>>>>>>>>> [3]
>>>>>>>>> https://docs.wso2.com/display/ML110/Building+from+Source#BuildingfromSource-Downloadingthesourcecheckout
>>>>>>>>> [4] https://spark.apache.org/docs/1.4.1/mllib-guide.html
>>>>>>>>> [5]
>>>>>>>>> https://spark.apache.org/docs/1.4.1/streaming-programming-guide.html
>>>>>>>>> [6]
>>>>>>>>> https://spark.apache.org/docs/1.4.1/mllib-linear-methods.html#streaming-linear-regression
>>>>>>>>> [7]
>>>>>>>>> https://spark.apache.org/docs/1.4.1/mllib-clustering.html#streaming-k-means
>>>>>>>>>
>>>>>>>>> On Wed, Mar 2, 2016 at 1:19 PM, Mahesh Dananjaya <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>> I am interesting on contribute to proposal 6: "Predictive
>>>>>>>>>> analytic with online data for WSO2 Machine Learner" for GSOC2 this 
>>>>>>>>>> time.
>>>>>>>>>> Since i have been engaging with some similar projects i think it 
>>>>>>>>>> will be a
>>>>>>>>>> great experience for me. Please let me know what you think and what 
>>>>>>>>>> you
>>>>>>>>>> suggest. I have been going through your documents.thank you.
>>>>>>>>>> regards,
>>>>>>>>>> Mahesh Dananjaya.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Dev mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> http://wso2.org/cgi-bin/mailman/listinfo/dev
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>>>> [email protected]
>>>>>>>>> +94711228855
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>> [email protected]
>>>>>> +94711228855
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Pruthuvi Maheshakya Wijewardena
>>>> [email protected]
>>>> +94711228855
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Pruthuvi Maheshakya Wijewardena
>> [email protected]
>> +94711228855
>>
>>
>>
>


-- 
Pruthuvi Maheshakya Wijewardena
[email protected]
+94711228855

_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]

Reply via email to