Hi Mahesh, You don't have to look into carbon-ml.
Best regards. On Sun, Mar 13, 2016 at 5:49 PM, Mahesh Dananjaya <[email protected] > wrote: > Hi maheshakya, > i am working on some examples related to Spark and ML.is there anything to > do with carbon-ml. I think i dont need to look into that one.do i? > BR, > Mahesh > > On Tue, Mar 8, 2016 at 11:55 AM, Maheshakya Wijewardena < > [email protected]> wrote: > >> Hi Mahesh, >> >> does that Scala API is with your current product or repo? >> >> >> No, we don't have the Scala API included. What we want is to design the >> Java implementations of those algorithms to train with mini-batches of >> streaming data with the help of the aforementioned methods so that we can >> include in as a CEP extension. >> >> As to clarify, please try to write a simple Java program using Spark >> MLLib linear regression and k-means clustering with a sample data set (You >> can find alot of data sets from UCI repo[1]). You need to break the >> dataset into several pieces and train a model repeatedly with those. >> After each training run, save the model information (such as weights, >> intercepts for regression and cluster centers for clustering - please check >> the arguments of those methods I have mentioned and save the required >> information of the model) >> When training a model we a new piece of data, use those methods to >> initialize and put the save values for the arguments. This way you can >> start from where you stopped in the previous run. >> >> Let us know your observations and feel free to ask if you need to know >> anything more on this. >> >> We'll let you know what needs to be done to include this in CEP. >> >> Best regards. >> >> On Tue, Mar 8, 2016 at 10:59 AM, Mahesh Dananjaya < >> [email protected]> wrote: >> >>> Hi Maheshakya, >>> great.thank you.i already have ML and CEP and working more towards it. >>> does that Scala API is with your current product or repo?. thank you. >>> BR, >>> Mahesh. >>> >>> On Sun, Mar 6, 2016 at 5:49 PM, Maheshakya Wijewardena < >>> [email protected]> wrote: >>> >>>> Hi Mahesh, >>>> >>>> Please find the comments inline. >>>> >>>> does data stream is taken to ML as the event publisher's format through >>>>> event publisher. Or we can use direct traffic that comes to event >>>>> receiver, or else as streams >>>>> >>>> We intend to use the direct data as even streams. >>>> >>>> 1.) Those data coming from wso2 DAS to ML are coming as streams? >>>>> >>>> No, WSO2 ML doesn't use any even stream. The data stored in tables in >>>> DAS is loaded into ML. >>>> >>>> 2.) Are there any incremental learning algorithms currently active in >>>>> ML?you mentioned that there are and they are with scala API. So there is a >>>>> streaming support with that Scala API. In that API which format the data >>>>> is >>>>> aquired to ML? >>>>> >>>> No, there are no incremental learning algorithms in ML. The scala API >>>> is about Spark MLLib. MLLib supports streaming k-means and other >>>> generalized linear models (linear regression variants and logistic >>>> regression) with Scala API. What they basically do in those implementations >>>> is retraining the trained models with mini batches when data sequentially >>>> arrives. There, the breaking of streaming data into mini batches is done >>>> with the help of Spark Streaming. But we do not intend to use Spark >>>> streaming in our implementation. What we need to do is implement a similar >>>> behavior for event streams using the Java API. The Java API has the >>>> following methods: >>>> >>>> - *createModel >>>> >>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html#createModel%28org.apache.spark.mllib.linalg.Vector,%20double%29>* >>>> (Vector >>>> >>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/linalg/Vector.html> >>>> weights, >>>> double intercept) - for GLMs >>>> - *setInitialModel >>>> >>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeans.html#setInitialModel%28org.apache.spark.mllib.clustering.KMeansModel%29>* >>>> (KMeansModel >>>> >>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeansModel.html> >>>> model) >>>> - for K means >>>> >>>> With the help of these methods, we can train models again with newly >>>> arriving data, keeping the characteristics learned with the previous data. >>>> When implementing this, we need to pay attention to other parameters of >>>> incremental learning such as data horizon and data obsolescence (indicated >>>> in the project ideas page). >>>> We need to discuss on how to add these with CEP event streams. I have >>>> added Suho into the thread for more clarification. >>>> >>>> Best regards. >>>> >>>> >>>> On Sat, Mar 5, 2016 at 5:15 PM, Mahesh Dananjaya < >>>> [email protected]> wrote: >>>> >>>>> Hi maheshakya, >>>>> as we concerned to use WSO2 CEP to handle streaming data and implement >>>>> the machine learning algorithms with Spark MLLib, does data stream is >>>>> taken >>>>> to ML as the event publisher's format through event publisher. Or we can >>>>> use direct traffic that comes to event receiver, or else as streams. >>>>> referring to https://docs.wso2.com/display/CEP410/User+Guide >>>>> 1.) Those data coming from wso2 DAS to ML are coming as streams? >>>>> 2.) Are there any incremental learning algorithms currently active >>>>> in ML?you mentioned that there are and they are with scala API. So there >>>>> is >>>>> a streaming support with that Scala API. In that API which format the data >>>>> is aquired to ML? >>>>> >>>>> thank you. >>>>> BR, >>>>> Mahesh. >>>>> >>>>> On Fri, Mar 4, 2016 at 2:03 PM, Maheshakya Wijewardena < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi Mahesh, >>>>>> >>>>>> We had to modify a the project scope a little to suit best for the >>>>>> requirements. We will update the project idea with those concerns soon >>>>>> and >>>>>> let you know. >>>>>> >>>>>> We do not support streaming data in WSO2 Machine learner at the >>>>>> moment. The new concern is to use WSO2 CEP to handle streaming data and >>>>>> implement the machine learning algorithms with Spark MLLib. You can look >>>>>> at >>>>>> the streaming k-means and streaming linear regression implementations in >>>>>> MLLib. Currently, the API is only for scala. Our need is to get the Java >>>>>> APIs of k-means and generalized linear models to support incremental >>>>>> learning with streaming data. This has to be done as mini-batch learning >>>>>> since these algorithms operates as stochastic gradient descents so that >>>>>> any >>>>>> learning with new data can be done on top of the previously learned >>>>>> models. >>>>>> So please go through the those APIs[1][2][3] and try to get an idea. >>>>>> Also please try to understand how event streams work in WSO2 CEP >>>>>> [4][5]. >>>>>> >>>>>> Best regards. >>>>>> >>>>>> [1] >>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html >>>>>> [2] >>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeans.html >>>>>> [3] >>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/classification/LogisticRegressionWithSGD.html >>>>>> [4] https://docs.wso2.com/display/CEP310/Working+with+Event+Streams >>>>>> [5] https://docs.wso2.com/display/CEP310/Working+with+Execution+Plans >>>>>> >>>>>> On Fri, Mar 4, 2016 at 11:26 AM, Mahesh Dananjaya < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi maheshakya, >>>>>>> give me sometime to go through your ML package. Do current product >>>>>>> have any stream data support?. i did some university projects related to >>>>>>> machine learning with regressions,modelling, factor analysis, cluster >>>>>>> analysis and classification problems (Discriminant Analysis) with SVM >>>>>>> (Support Vector machines), Neural networks, LS classification and >>>>>>> ML(Maximum likelihood). give me sometime to see how wso2 architecture >>>>>>> works.then i can come up with good architecture.thank you. >>>>>>> BR, >>>>>>> Mahesh. >>>>>>> >>>>>>> On Wed, Mar 2, 2016 at 2:41 PM, Mahesh Dananjaya < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi Maheshakya, >>>>>>>> Thank you for the resources. I will go through this and looking >>>>>>>> forward to this proposed project.Thank you. >>>>>>>> BR, >>>>>>>> Mahesh. >>>>>>>> >>>>>>>> On Wed, Mar 2, 2016 at 1:52 PM, Maheshakya Wijewardena < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi Mahesh, >>>>>>>>> >>>>>>>>> Thank you for the interest for this project. >>>>>>>>> >>>>>>>>> We would like to know what type of similar projects you have >>>>>>>>> worked on. You may have seen that WSO2 Machine Learner supports >>>>>>>>> several >>>>>>>>> learning algorithms at the moment[1]. This project intends to >>>>>>>>> leverage the >>>>>>>>> existing algorithms in WSO2 Machine Learner to support streaming >>>>>>>>> data. As >>>>>>>>> an initiative, first you can get an idea about what WSO2 Machine >>>>>>>>> Learner >>>>>>>>> does and how it operates. You can download WSO2 Machine Learner from >>>>>>>>> product page[2] and the the source code [3]. ML is using Apache Spark >>>>>>>>> MLLib[4] for its' algorithms so it's better to read and understand >>>>>>>>> what it >>>>>>>>> does as well. >>>>>>>>> >>>>>>>>> In order to get an idea about the deliverables and the scope of >>>>>>>>> this project, try to understand how Spark streaming[5] (see examples) >>>>>>>>> handles streaming data. Also, have a look in the streaming >>>>>>>>> algorithms[6][7] >>>>>>>>> supported by MLLib. There are two approaches discussed to employ >>>>>>>>> incremental learning in ML in the project proposals page. These >>>>>>>>> streaming >>>>>>>>> algorithms can be directly used in the first approach. For the other >>>>>>>>> approach, the your implementation should contain a procedure to >>>>>>>>> create mini >>>>>>>>> batches from streaming data with relevant sizes (i.e. a moving >>>>>>>>> window) and >>>>>>>>> do periodic retraining of the same algorithm. >>>>>>>>> >>>>>>>>> To start with the project, you will need to come up with a >>>>>>>>> suitable plan and an architecture first. >>>>>>>>> >>>>>>>>> Please watch the video referenced in the proposal (reference: 5). >>>>>>>>> It will help you getting a better idea about machine learning >>>>>>>>> algorithms >>>>>>>>> with streaming data. >>>>>>>>> >>>>>>>>> Let us know if you need any help with these. >>>>>>>>> >>>>>>>>> Best regards >>>>>>>>> >>>>>>>>> [1] https://docs.wso2.com/display/ML110/Machine+Learner+Algorithms >>>>>>>>> [2] http://wso2.com/products/machine-learner/ >>>>>>>>> [3] >>>>>>>>> https://docs.wso2.com/display/ML110/Building+from+Source#BuildingfromSource-Downloadingthesourcecheckout >>>>>>>>> [4] https://spark.apache.org/docs/1.4.1/mllib-guide.html >>>>>>>>> [5] >>>>>>>>> https://spark.apache.org/docs/1.4.1/streaming-programming-guide.html >>>>>>>>> [6] >>>>>>>>> https://spark.apache.org/docs/1.4.1/mllib-linear-methods.html#streaming-linear-regression >>>>>>>>> [7] >>>>>>>>> https://spark.apache.org/docs/1.4.1/mllib-clustering.html#streaming-k-means >>>>>>>>> >>>>>>>>> On Wed, Mar 2, 2016 at 1:19 PM, Mahesh Dananjaya < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> I am interesting on contribute to proposal 6: "Predictive >>>>>>>>>> analytic with online data for WSO2 Machine Learner" for GSOC2 this >>>>>>>>>> time. >>>>>>>>>> Since i have been engaging with some similar projects i think it >>>>>>>>>> will be a >>>>>>>>>> great experience for me. Please let me know what you think and what >>>>>>>>>> you >>>>>>>>>> suggest. I have been going through your documents.thank you. >>>>>>>>>> regards, >>>>>>>>>> Mahesh Dananjaya. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Dev mailing list >>>>>>>>>> [email protected] >>>>>>>>>> http://wso2.org/cgi-bin/mailman/listinfo/dev >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Pruthuvi Maheshakya Wijewardena >>>>>>>>> [email protected] >>>>>>>>> +94711228855 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Pruthuvi Maheshakya Wijewardena >>>>>> [email protected] >>>>>> +94711228855 >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Pruthuvi Maheshakya Wijewardena >>>> [email protected] >>>> +94711228855 >>>> >>>> >>>> >>> >> >> >> -- >> Pruthuvi Maheshakya Wijewardena >> [email protected] >> +94711228855 >> >> >> > -- Pruthuvi Maheshakya Wijewardena [email protected] +94711228855
_______________________________________________ Dev mailing list [email protected] http://wso2.org/cgi-bin/mailman/listinfo/dev
