Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]

Mahesh Dananjaya Mon, 14 Mar 2016 01:00:01 -0700

Hi Maheshakya,
I am writing some java programs and try to break the dataset into several
pieces and train a model repeatedly with those data sets using Spark MLLib.
Do i have to do anything with Hadoop at this stage, because i am working
with a standalone mode.thank you.
BR,
Mahesh.


On Sun, Mar 13, 2016 at 6:30 PM, Maheshakya Wijewardena <[email protected]
> wrote:

> Hi Mahesh,
>
> You don't have to look into carbon-ml.
>
> Best regards.
>
> On Sun, Mar 13, 2016 at 5:49 PM, Mahesh Dananjaya <
> [email protected]> wrote:
>
>> Hi maheshakya,
>> i am working on some examples related to Spark and ML.is there anything
>> to do with carbon-ml. I think i dont need to look into that one.do i?
>> BR,
>> Mahesh
>>
>> On Tue, Mar 8, 2016 at 11:55 AM, Maheshakya Wijewardena <
>> [email protected]> wrote:
>>
>>> Hi Mahesh,
>>>
>>> does that Scala API is with your current product or repo?
>>>
>>>
>>> No, we don't have the Scala API included. What we want is to design the
>>> Java implementations of those algorithms to train with mini-batches of
>>> streaming data with the help of the aforementioned methods so that we can
>>> include in as a CEP extension.
>>>
>>> As to clarify, please try to write a simple Java program using Spark
>>> MLLib linear regression and k-means clustering with a sample data set (You
>>> can find alot of data sets from UCI repo[1]).  You need to break the
>>> dataset into several pieces and train a model repeatedly with those.
>>> After each training run, save the model information (such as weights,
>>> intercepts for regression and cluster centers for clustering - please check
>>> the arguments of those methods I have mentioned and save the required
>>> information of the model)
>>> When training a model we a new piece of data, use those methods to
>>> initialize and put the save values for the arguments. This way you can
>>> start from where you stopped in the previous run.
>>>
>>> Let us know your observations and feel free to ask if you need to know
>>> anything more on this.
>>>
>>> We'll let you know what needs to be done to include this in CEP.
>>>
>>> Best regards.
>>>
>>> On Tue, Mar 8, 2016 at 10:59 AM, Mahesh Dananjaya <
>>> [email protected]> wrote:
>>>
>>>> Hi Maheshakya,
>>>> great.thank you.i already have ML and CEP and working more towards it.
>>>> does that Scala API is with your current product or repo?.  thank you.
>>>> BR,
>>>> Mahesh.
>>>>
>>>> On Sun, Mar 6, 2016 at 5:49 PM, Maheshakya Wijewardena <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi Mahesh,
>>>>>
>>>>> Please find the comments inline.
>>>>>
>>>>> does data stream is taken to ML as the event publisher's format
>>>>>> through event publisher. Or  we can use direct traffic that comes to 
>>>>>> event
>>>>>> receiver, or else as streams
>>>>>>
>>>>> We intend to use the direct data as even streams.
>>>>>
>>>>> 1.) Those data coming from wso2 DAS to ML are coming as streams?
>>>>>>
>>>>> No, WSO2 ML doesn't use any even stream. The data stored in tables in
>>>>> DAS is loaded into ML.
>>>>>
>>>>> 2.) Are there any incremental learning algorithms currently active in
>>>>>> ML?you mentioned that there are and they are with scala API. So there is 
>>>>>> a
>>>>>> streaming support with that Scala API. In that API which format the data 
>>>>>> is
>>>>>> aquired to ML?
>>>>>>
>>>>> No, there are no incremental learning algorithms in ML. The scala API
>>>>> is about Spark MLLib. MLLib supports streaming k-means and other
>>>>> generalized linear models (linear regression variants and logistic
>>>>> regression) with Scala API. What they basically do in those 
>>>>> implementations
>>>>> is retraining the trained models with mini batches when data sequentially
>>>>> arrives. There, the breaking of streaming data into mini batches is done
>>>>> with the help of Spark Streaming. But we do not intend to use Spark
>>>>> streaming in our implementation. What we need to do is implement a similar
>>>>> behavior for event streams using the Java API.  The Java API has the
>>>>> following methods:
>>>>>
>>>>>    - *createModel
>>>>>    
>>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html#createModel%28org.apache.spark.mllib.linalg.Vector,%20double%29>*
>>>>>    (Vector
>>>>>    
>>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/linalg/Vector.html>
>>>>>  weights,
>>>>>    double intercept) - for GLMs
>>>>>    - *setInitialModel
>>>>>    
>>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeans.html#setInitialModel%28org.apache.spark.mllib.clustering.KMeansModel%29>*
>>>>>    (KMeansModel
>>>>>    
>>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeansModel.html>
>>>>>  model)
>>>>>    - for K means
>>>>>
>>>>> With the help of these methods, we can train models again with newly
>>>>> arriving data, keeping the characteristics learned with the previous data.
>>>>> When implementing this, we need to pay attention to other parameters of
>>>>> incremental learning such as data horizon and data obsolescence (indicated
>>>>> in the project ideas page).
>>>>> We need to discuss on how to add these with CEP event streams. I have
>>>>> added Suho into the thread for more clarification.
>>>>>
>>>>> Best regards.
>>>>>
>>>>>
>>>>> On Sat, Mar 5, 2016 at 5:15 PM, Mahesh Dananjaya <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi maheshakya,
>>>>>> as we concerned to use WSO2 CEP to handle streaming data and
>>>>>> implement the machine learning algorithms with Spark MLLib, does data
>>>>>> stream is taken to ML as the event publisher's format through event
>>>>>> publisher. Or  we can use direct traffic that comes to event receiver, or
>>>>>> else as streams. referring to
>>>>>> https://docs.wso2.com/display/CEP410/User+Guide
>>>>>>     1.) Those data coming from wso2 DAS to ML are coming as streams?
>>>>>>     2.) Are there any incremental learning algorithms currently
>>>>>> active in ML?you mentioned that there are and they are with scala API. So
>>>>>> there is a streaming support with that Scala API. In that API which 
>>>>>> format
>>>>>> the data is aquired to ML?
>>>>>>
>>>>>> thank you.
>>>>>> BR,
>>>>>> Mahesh.
>>>>>>
>>>>>> On Fri, Mar 4, 2016 at 2:03 PM, Maheshakya Wijewardena <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi Mahesh,
>>>>>>>
>>>>>>> We had to modify a the project scope a little to suit best for the
>>>>>>> requirements. We will update the project idea with those concerns soon 
>>>>>>> and
>>>>>>> let you know.
>>>>>>>
>>>>>>> We do not support streaming data in WSO2 Machine learner at the
>>>>>>> moment. The new concern is to use WSO2 CEP to handle streaming data and
>>>>>>> implement the machine learning algorithms with Spark MLLib. You can 
>>>>>>> look at
>>>>>>> the streaming k-means and streaming linear regression implementations in
>>>>>>> MLLib. Currently, the API is only for scala. Our need is to get the Java
>>>>>>> APIs of k-means and generalized linear models to support incremental
>>>>>>> learning with streaming data. This has to be done as mini-batch learning
>>>>>>> since these algorithms operates as stochastic gradient descents so that 
>>>>>>> any
>>>>>>> learning with new data can be done on top of the previously learned 
>>>>>>> models.
>>>>>>> So please go through the those APIs[1][2][3] and try to get an idea.
>>>>>>> Also please try to understand how event streams work in WSO2 CEP
>>>>>>> [4][5].
>>>>>>>
>>>>>>> Best regards.
>>>>>>>
>>>>>>> [1]
>>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html
>>>>>>> [2]
>>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeans.html
>>>>>>> [3]
>>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/classification/LogisticRegressionWithSGD.html
>>>>>>> [4] https://docs.wso2.com/display/CEP310/Working+with+Event+Streams
>>>>>>> [5]
>>>>>>> https://docs.wso2.com/display/CEP310/Working+with+Execution+Plans
>>>>>>>
>>>>>>> On Fri, Mar 4, 2016 at 11:26 AM, Mahesh Dananjaya <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi maheshakya,
>>>>>>>> give me sometime to go through your ML package. Do current product
>>>>>>>> have any stream data support?. i did some university projects related 
>>>>>>>> to
>>>>>>>> machine learning with regressions,modelling, factor analysis, cluster
>>>>>>>> analysis and classification problems (Discriminant Analysis) with SVM
>>>>>>>> (Support Vector machines), Neural networks, LS classification and
>>>>>>>> ML(Maximum likelihood). give me sometime to see how wso2 architecture
>>>>>>>> works.then i can come up with good architecture.thank you.
>>>>>>>> BR,
>>>>>>>> Mahesh.
>>>>>>>>
>>>>>>>> On Wed, Mar 2, 2016 at 2:41 PM, Mahesh Dananjaya <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi Maheshakya,
>>>>>>>>> Thank you for the resources. I will go through this and looking
>>>>>>>>> forward to this proposed project.Thank you.
>>>>>>>>> BR,
>>>>>>>>> Mahesh.
>>>>>>>>>
>>>>>>>>> On Wed, Mar 2, 2016 at 1:52 PM, Maheshakya Wijewardena <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Mahesh,
>>>>>>>>>>
>>>>>>>>>> Thank you for the interest for this project.
>>>>>>>>>>
>>>>>>>>>> We would like to know what type of similar projects you have
>>>>>>>>>> worked on. You may have seen that WSO2 Machine Learner supports 
>>>>>>>>>> several
>>>>>>>>>> learning algorithms at the moment[1]. This project intends to 
>>>>>>>>>> leverage the
>>>>>>>>>> existing algorithms in WSO2 Machine Learner to support streaming 
>>>>>>>>>> data. As
>>>>>>>>>> an initiative, first you can get an idea about what WSO2 Machine 
>>>>>>>>>> Learner
>>>>>>>>>> does and how it operates. You can download WSO2 Machine Learner from
>>>>>>>>>> product page[2] and the the source code [3]. ML is using Apache Spark
>>>>>>>>>> MLLib[4] for its' algorithms so it's better to read and understand 
>>>>>>>>>> what it
>>>>>>>>>> does as well.
>>>>>>>>>>
>>>>>>>>>> In order to get an idea about the deliverables and the scope of
>>>>>>>>>> this project, try to understand how Spark streaming[5] (see examples)
>>>>>>>>>> handles streaming data. Also, have a look in the streaming 
>>>>>>>>>> algorithms[6][7]
>>>>>>>>>> supported by MLLib. There are two approaches discussed to employ
>>>>>>>>>> incremental learning in ML in the project proposals page. These 
>>>>>>>>>> streaming
>>>>>>>>>> algorithms can be directly used in the first approach. For the other
>>>>>>>>>> approach, the your implementation should contain a procedure to 
>>>>>>>>>> create mini
>>>>>>>>>> batches from streaming data with relevant sizes (i.e. a moving 
>>>>>>>>>> window) and
>>>>>>>>>> do periodic retraining of the same algorithm.
>>>>>>>>>>
>>>>>>>>>> To start with the project, you will need to come up with a
>>>>>>>>>> suitable plan and an architecture first.
>>>>>>>>>>
>>>>>>>>>> Please watch the video referenced in the proposal (reference: 5).
>>>>>>>>>> It will help you getting a better idea about machine learning 
>>>>>>>>>> algorithms
>>>>>>>>>> with streaming data.
>>>>>>>>>>
>>>>>>>>>> Let us know if you need any help with these.
>>>>>>>>>>
>>>>>>>>>> Best regards
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> https://docs.wso2.com/display/ML110/Machine+Learner+Algorithms
>>>>>>>>>> [2] http://wso2.com/products/machine-learner/
>>>>>>>>>> [3]
>>>>>>>>>> https://docs.wso2.com/display/ML110/Building+from+Source#BuildingfromSource-Downloadingthesourcecheckout
>>>>>>>>>> [4] https://spark.apache.org/docs/1.4.1/mllib-guide.html
>>>>>>>>>> [5]
>>>>>>>>>> https://spark.apache.org/docs/1.4.1/streaming-programming-guide.html
>>>>>>>>>> [6]
>>>>>>>>>> https://spark.apache.org/docs/1.4.1/mllib-linear-methods.html#streaming-linear-regression
>>>>>>>>>> [7]
>>>>>>>>>> https://spark.apache.org/docs/1.4.1/mllib-clustering.html#streaming-k-means
>>>>>>>>>>
>>>>>>>>>> On Wed, Mar 2, 2016 at 1:19 PM, Mahesh Dananjaya <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>> I am interesting on contribute to proposal 6: "Predictive
>>>>>>>>>>> analytic with online data for WSO2 Machine Learner" for GSOC2 this 
>>>>>>>>>>> time.
>>>>>>>>>>> Since i have been engaging with some similar projects i think it 
>>>>>>>>>>> will be a
>>>>>>>>>>> great experience for me. Please let me know what you think and what 
>>>>>>>>>>> you
>>>>>>>>>>> suggest. I have been going through your documents.thank you.
>>>>>>>>>>> regards,
>>>>>>>>>>> Mahesh Dananjaya.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Dev mailing list
>>>>>>>>>>> [email protected]
>>>>>>>>>>> http://wso2.org/cgi-bin/mailman/listinfo/dev
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>>>>> [email protected]
>>>>>>>>>> +94711228855
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>> [email protected]
>>>>>>> +94711228855
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Pruthuvi Maheshakya Wijewardena
>>>>> [email protected]
>>>>> +94711228855
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Pruthuvi Maheshakya Wijewardena
>>> [email protected]
>>> +94711228855
>>>
>>>
>>>
>>
>
>
> --
> Pruthuvi Maheshakya Wijewardena
> [email protected]
> +94711228855
>
>
>

_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]

Reply via email to