Hi Maheshakya, I am writing some java programs and try to break the dataset into several pieces and train a model repeatedly with those data sets using Spark MLLib. Do i have to do anything with Hadoop at this stage, because i am working with a standalone mode.thank you. BR, Mahesh.
On Sun, Mar 13, 2016 at 6:30 PM, Maheshakya Wijewardena <[email protected] > wrote: > Hi Mahesh, > > You don't have to look into carbon-ml. > > Best regards. > > On Sun, Mar 13, 2016 at 5:49 PM, Mahesh Dananjaya < > [email protected]> wrote: > >> Hi maheshakya, >> i am working on some examples related to Spark and ML.is there anything >> to do with carbon-ml. I think i dont need to look into that one.do i? >> BR, >> Mahesh >> >> On Tue, Mar 8, 2016 at 11:55 AM, Maheshakya Wijewardena < >> [email protected]> wrote: >> >>> Hi Mahesh, >>> >>> does that Scala API is with your current product or repo? >>> >>> >>> No, we don't have the Scala API included. What we want is to design the >>> Java implementations of those algorithms to train with mini-batches of >>> streaming data with the help of the aforementioned methods so that we can >>> include in as a CEP extension. >>> >>> As to clarify, please try to write a simple Java program using Spark >>> MLLib linear regression and k-means clustering with a sample data set (You >>> can find alot of data sets from UCI repo[1]). You need to break the >>> dataset into several pieces and train a model repeatedly with those. >>> After each training run, save the model information (such as weights, >>> intercepts for regression and cluster centers for clustering - please check >>> the arguments of those methods I have mentioned and save the required >>> information of the model) >>> When training a model we a new piece of data, use those methods to >>> initialize and put the save values for the arguments. This way you can >>> start from where you stopped in the previous run. >>> >>> Let us know your observations and feel free to ask if you need to know >>> anything more on this. >>> >>> We'll let you know what needs to be done to include this in CEP. >>> >>> Best regards. >>> >>> On Tue, Mar 8, 2016 at 10:59 AM, Mahesh Dananjaya < >>> [email protected]> wrote: >>> >>>> Hi Maheshakya, >>>> great.thank you.i already have ML and CEP and working more towards it. >>>> does that Scala API is with your current product or repo?. thank you. >>>> BR, >>>> Mahesh. >>>> >>>> On Sun, Mar 6, 2016 at 5:49 PM, Maheshakya Wijewardena < >>>> [email protected]> wrote: >>>> >>>>> Hi Mahesh, >>>>> >>>>> Please find the comments inline. >>>>> >>>>> does data stream is taken to ML as the event publisher's format >>>>>> through event publisher. Or we can use direct traffic that comes to >>>>>> event >>>>>> receiver, or else as streams >>>>>> >>>>> We intend to use the direct data as even streams. >>>>> >>>>> 1.) Those data coming from wso2 DAS to ML are coming as streams? >>>>>> >>>>> No, WSO2 ML doesn't use any even stream. The data stored in tables in >>>>> DAS is loaded into ML. >>>>> >>>>> 2.) Are there any incremental learning algorithms currently active in >>>>>> ML?you mentioned that there are and they are with scala API. So there is >>>>>> a >>>>>> streaming support with that Scala API. In that API which format the data >>>>>> is >>>>>> aquired to ML? >>>>>> >>>>> No, there are no incremental learning algorithms in ML. The scala API >>>>> is about Spark MLLib. MLLib supports streaming k-means and other >>>>> generalized linear models (linear regression variants and logistic >>>>> regression) with Scala API. What they basically do in those >>>>> implementations >>>>> is retraining the trained models with mini batches when data sequentially >>>>> arrives. There, the breaking of streaming data into mini batches is done >>>>> with the help of Spark Streaming. But we do not intend to use Spark >>>>> streaming in our implementation. What we need to do is implement a similar >>>>> behavior for event streams using the Java API. The Java API has the >>>>> following methods: >>>>> >>>>> - *createModel >>>>> >>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html#createModel%28org.apache.spark.mllib.linalg.Vector,%20double%29>* >>>>> (Vector >>>>> >>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/linalg/Vector.html> >>>>> weights, >>>>> double intercept) - for GLMs >>>>> - *setInitialModel >>>>> >>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeans.html#setInitialModel%28org.apache.spark.mllib.clustering.KMeansModel%29>* >>>>> (KMeansModel >>>>> >>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeansModel.html> >>>>> model) >>>>> - for K means >>>>> >>>>> With the help of these methods, we can train models again with newly >>>>> arriving data, keeping the characteristics learned with the previous data. >>>>> When implementing this, we need to pay attention to other parameters of >>>>> incremental learning such as data horizon and data obsolescence (indicated >>>>> in the project ideas page). >>>>> We need to discuss on how to add these with CEP event streams. I have >>>>> added Suho into the thread for more clarification. >>>>> >>>>> Best regards. >>>>> >>>>> >>>>> On Sat, Mar 5, 2016 at 5:15 PM, Mahesh Dananjaya < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi maheshakya, >>>>>> as we concerned to use WSO2 CEP to handle streaming data and >>>>>> implement the machine learning algorithms with Spark MLLib, does data >>>>>> stream is taken to ML as the event publisher's format through event >>>>>> publisher. Or we can use direct traffic that comes to event receiver, or >>>>>> else as streams. referring to >>>>>> https://docs.wso2.com/display/CEP410/User+Guide >>>>>> 1.) Those data coming from wso2 DAS to ML are coming as streams? >>>>>> 2.) Are there any incremental learning algorithms currently >>>>>> active in ML?you mentioned that there are and they are with scala API. So >>>>>> there is a streaming support with that Scala API. In that API which >>>>>> format >>>>>> the data is aquired to ML? >>>>>> >>>>>> thank you. >>>>>> BR, >>>>>> Mahesh. >>>>>> >>>>>> On Fri, Mar 4, 2016 at 2:03 PM, Maheshakya Wijewardena < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi Mahesh, >>>>>>> >>>>>>> We had to modify a the project scope a little to suit best for the >>>>>>> requirements. We will update the project idea with those concerns soon >>>>>>> and >>>>>>> let you know. >>>>>>> >>>>>>> We do not support streaming data in WSO2 Machine learner at the >>>>>>> moment. The new concern is to use WSO2 CEP to handle streaming data and >>>>>>> implement the machine learning algorithms with Spark MLLib. You can >>>>>>> look at >>>>>>> the streaming k-means and streaming linear regression implementations in >>>>>>> MLLib. Currently, the API is only for scala. Our need is to get the Java >>>>>>> APIs of k-means and generalized linear models to support incremental >>>>>>> learning with streaming data. This has to be done as mini-batch learning >>>>>>> since these algorithms operates as stochastic gradient descents so that >>>>>>> any >>>>>>> learning with new data can be done on top of the previously learned >>>>>>> models. >>>>>>> So please go through the those APIs[1][2][3] and try to get an idea. >>>>>>> Also please try to understand how event streams work in WSO2 CEP >>>>>>> [4][5]. >>>>>>> >>>>>>> Best regards. >>>>>>> >>>>>>> [1] >>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html >>>>>>> [2] >>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeans.html >>>>>>> [3] >>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/classification/LogisticRegressionWithSGD.html >>>>>>> [4] https://docs.wso2.com/display/CEP310/Working+with+Event+Streams >>>>>>> [5] >>>>>>> https://docs.wso2.com/display/CEP310/Working+with+Execution+Plans >>>>>>> >>>>>>> On Fri, Mar 4, 2016 at 11:26 AM, Mahesh Dananjaya < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi maheshakya, >>>>>>>> give me sometime to go through your ML package. Do current product >>>>>>>> have any stream data support?. i did some university projects related >>>>>>>> to >>>>>>>> machine learning with regressions,modelling, factor analysis, cluster >>>>>>>> analysis and classification problems (Discriminant Analysis) with SVM >>>>>>>> (Support Vector machines), Neural networks, LS classification and >>>>>>>> ML(Maximum likelihood). give me sometime to see how wso2 architecture >>>>>>>> works.then i can come up with good architecture.thank you. >>>>>>>> BR, >>>>>>>> Mahesh. >>>>>>>> >>>>>>>> On Wed, Mar 2, 2016 at 2:41 PM, Mahesh Dananjaya < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hi Maheshakya, >>>>>>>>> Thank you for the resources. I will go through this and looking >>>>>>>>> forward to this proposed project.Thank you. >>>>>>>>> BR, >>>>>>>>> Mahesh. >>>>>>>>> >>>>>>>>> On Wed, Mar 2, 2016 at 1:52 PM, Maheshakya Wijewardena < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi Mahesh, >>>>>>>>>> >>>>>>>>>> Thank you for the interest for this project. >>>>>>>>>> >>>>>>>>>> We would like to know what type of similar projects you have >>>>>>>>>> worked on. You may have seen that WSO2 Machine Learner supports >>>>>>>>>> several >>>>>>>>>> learning algorithms at the moment[1]. This project intends to >>>>>>>>>> leverage the >>>>>>>>>> existing algorithms in WSO2 Machine Learner to support streaming >>>>>>>>>> data. As >>>>>>>>>> an initiative, first you can get an idea about what WSO2 Machine >>>>>>>>>> Learner >>>>>>>>>> does and how it operates. You can download WSO2 Machine Learner from >>>>>>>>>> product page[2] and the the source code [3]. ML is using Apache Spark >>>>>>>>>> MLLib[4] for its' algorithms so it's better to read and understand >>>>>>>>>> what it >>>>>>>>>> does as well. >>>>>>>>>> >>>>>>>>>> In order to get an idea about the deliverables and the scope of >>>>>>>>>> this project, try to understand how Spark streaming[5] (see examples) >>>>>>>>>> handles streaming data. Also, have a look in the streaming >>>>>>>>>> algorithms[6][7] >>>>>>>>>> supported by MLLib. There are two approaches discussed to employ >>>>>>>>>> incremental learning in ML in the project proposals page. These >>>>>>>>>> streaming >>>>>>>>>> algorithms can be directly used in the first approach. For the other >>>>>>>>>> approach, the your implementation should contain a procedure to >>>>>>>>>> create mini >>>>>>>>>> batches from streaming data with relevant sizes (i.e. a moving >>>>>>>>>> window) and >>>>>>>>>> do periodic retraining of the same algorithm. >>>>>>>>>> >>>>>>>>>> To start with the project, you will need to come up with a >>>>>>>>>> suitable plan and an architecture first. >>>>>>>>>> >>>>>>>>>> Please watch the video referenced in the proposal (reference: 5). >>>>>>>>>> It will help you getting a better idea about machine learning >>>>>>>>>> algorithms >>>>>>>>>> with streaming data. >>>>>>>>>> >>>>>>>>>> Let us know if you need any help with these. >>>>>>>>>> >>>>>>>>>> Best regards >>>>>>>>>> >>>>>>>>>> [1] >>>>>>>>>> https://docs.wso2.com/display/ML110/Machine+Learner+Algorithms >>>>>>>>>> [2] http://wso2.com/products/machine-learner/ >>>>>>>>>> [3] >>>>>>>>>> https://docs.wso2.com/display/ML110/Building+from+Source#BuildingfromSource-Downloadingthesourcecheckout >>>>>>>>>> [4] https://spark.apache.org/docs/1.4.1/mllib-guide.html >>>>>>>>>> [5] >>>>>>>>>> https://spark.apache.org/docs/1.4.1/streaming-programming-guide.html >>>>>>>>>> [6] >>>>>>>>>> https://spark.apache.org/docs/1.4.1/mllib-linear-methods.html#streaming-linear-regression >>>>>>>>>> [7] >>>>>>>>>> https://spark.apache.org/docs/1.4.1/mllib-clustering.html#streaming-k-means >>>>>>>>>> >>>>>>>>>> On Wed, Mar 2, 2016 at 1:19 PM, Mahesh Dananjaya < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> I am interesting on contribute to proposal 6: "Predictive >>>>>>>>>>> analytic with online data for WSO2 Machine Learner" for GSOC2 this >>>>>>>>>>> time. >>>>>>>>>>> Since i have been engaging with some similar projects i think it >>>>>>>>>>> will be a >>>>>>>>>>> great experience for me. Please let me know what you think and what >>>>>>>>>>> you >>>>>>>>>>> suggest. I have been going through your documents.thank you. >>>>>>>>>>> regards, >>>>>>>>>>> Mahesh Dananjaya. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Dev mailing list >>>>>>>>>>> [email protected] >>>>>>>>>>> http://wso2.org/cgi-bin/mailman/listinfo/dev >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Pruthuvi Maheshakya Wijewardena >>>>>>>>>> [email protected] >>>>>>>>>> +94711228855 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Pruthuvi Maheshakya Wijewardena >>>>>>> [email protected] >>>>>>> +94711228855 >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Pruthuvi Maheshakya Wijewardena >>>>> [email protected] >>>>> +94711228855 >>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> Pruthuvi Maheshakya Wijewardena >>> [email protected] >>> +94711228855 >>> >>> >>> >> > > > -- > Pruthuvi Maheshakya Wijewardena > [email protected] > +94711228855 > > >
_______________________________________________ Dev mailing list [email protected] http://wso2.org/cgi-bin/mailman/listinfo/dev
