Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]

Nirmal Fernando Sun, 03 Apr 2016 22:13:18 -0700

Hi Mahesh,

So, we are in the process of evaluating proposals, till then, you can start
doing some project related tasks and update us on what you did. Also feel
free to ask any questions that you may have.


On Thu, Mar 31, 2016 at 2:48 PM, Mahesh Dananjaya <[email protected]
> wrote:

> Hi Maheshakya,
> Google have accepted my proof of enrollment. So do i need to proceed
> further with the project?t. I have been working with the Spark MLLib and
> trying to implement those two algorithms. Can you please tell me what is
> the next step i want to do.do i need to wait?thank you.
> regards,
> Mahesh.
>
> On Fri, Mar 25, 2016 at 10:40 PM, Mahesh Dananjaya <
> [email protected]> wrote:
>
>> Hi Maheshakya,
>> Thank you very much for the support given during the last couple of
>> weeks.I have finally submitted the proposal to the site.And i am looking
>> forward to contribute to your wso2 ml.thank you.
>> regards,
>> Mahesh.
>>
>> On Fri, Mar 25, 2016 at 7:49 PM, Mahesh Dananjaya <
>> [email protected]> wrote:
>>
>>> Hi maheshakya,
>>> i added the timeline according to my knowledge and uploaded.pls
>>> check.thank you.
>>> regards,
>>> Mahesh.
>>>
>>> On Fri, Mar 25, 2016 at 7:09 PM, Maheshakya Wijewardena <
>>> [email protected]> wrote:
>>>
>>>> Hi Mahesh,
>>>>
>>>> Can you add the time line of the project as I've mentioned. It's one of
>>>> the crucial parts of the proposal that allows us to evaluate feasibility of
>>>> the project in accordance with the given time period by Google.
>>>>
>>>> Best regards.
>>>>
>>>> On Fri, Mar 25, 2016 at 6:53 PM, Mahesh Dananjaya <
>>>> [email protected]> wrote:
>>>>
>>>>>
>>>>> ---------- Forwarded message ----------
>>>>> From: Mahesh Dananjaya <[email protected]>
>>>>> Date: Fri, Mar 25, 2016 at 7:02 PM
>>>>> Subject: Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]
>>>>> To: Maheshakya Wijewardena <[email protected]>
>>>>>
>>>>>
>>>>> Hi maheshakya,
>>>>> I have uploaded my final submission.here it is. pls check it and
>>>>> inform me anything i need to change.thank you.
>>>>> BR,
>>>>> Mahesh.
>>>>>
>>>>> On Fri, Mar 25, 2016 at 6:28 PM, Mahesh Dananjaya <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi Maheshakya,
>>>>>> thank you very much. I will be updating the proposal with those
>>>>>> changes and i will submit it by now.thank you.
>>>>>> regards,
>>>>>> Mahesh.
>>>>>>
>>>>>> On Fri, Mar 25, 2016 at 6:07 PM, Maheshakya Wijewardena <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi Mahesh,
>>>>>>>
>>>>>>> In the title, please include both tags [ML] and [CEP]
>>>>>>>
>>>>>>> Best regards.
>>>>>>>
>>>>>>> On Fri, Mar 25, 2016 at 5:49 PM, Maheshakya Wijewardena <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Also, please include an introduction to yourself (University,
>>>>>>>> department), past experience in machine learning, language 
>>>>>>>> proficiency, etc
>>>>>>>> at the beginning of the proposal.
>>>>>>>>
>>>>>>>> Best regards.
>>>>>>>>
>>>>>>>> On Fri, Mar 25, 2016 at 5:47 PM, Maheshakya Wijewardena <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi Mahesh,
>>>>>>>>>
>>>>>>>>> Thank you for sending the draft. Please submit it as soon as
>>>>>>>>> possible.
>>>>>>>>>
>>>>>>>>> Few high level comments:
>>>>>>>>>
>>>>>>>>> In the proposal, you must specifically mention that this will be
>>>>>>>>> implemented as a Siddhi extension that can operate directly on 
>>>>>>>>> incoming
>>>>>>>>> streams.
>>>>>>>>>
>>>>>>>>> Also, you need to have a time line for the project, A sample looks
>>>>>>>>> like:
>>>>>>>>>
>>>>>>>>> May 1- May 20 - Community bonding period - Getting familiar with
>>>>>>>>> the platform and discussing implementation methods.
>>>>>>>>> May 20 - May 30 - Implementing streaming k-means,
>>>>>>>>> -----
>>>>>>>>> -----
>>>>>>>>> July 20-24 - Writing examples
>>>>>>>>> July 24-18 - Documentation
>>>>>>>>>
>>>>>>>>> This should end before pencils down date. Refer to the correct
>>>>>>>>> time line given in GSoC site.
>>>>>>>>>
>>>>>>>>> The implementation details of the the streaming algorithms looks
>>>>>>>>> fine.
>>>>>>>>>
>>>>>>>>> Best regards.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Mar 25, 2016 at 5:23 PM, Mahesh Dananjaya <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Maheshakya,
>>>>>>>>>> this is my draft proposal.
>>>>>>>>>>
>>>>>>>>>> https://docs.google.com/document/d/1apZfEXZXEH5GwSwS7hARINbGw5_zinxWdZjEmyqfKu4/edit?usp=sha
>>>>>>>>>> <https://docs.google.com/document/d/1apZfEXZXEH5GwSwS7hARINbGw5_zinxWdZjEmyqfKu4/edit?usp=sharing>
>>>>>>>>>> ring
>>>>>>>>>> can you ple check this and see whether it is correct.thank you.
>>>>>>>>>> BR,
>>>>>>>>>> Mahesh
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Mar 21, 2016 at 1:15 PM, Maheshakya Wijewardena <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Mahesh,
>>>>>>>>>>>
>>>>>>>>>>> The deadline for submitting your proposals is on March 25th,
>>>>>>>>>>> 2016, therefore please start writing the proposal and get feedback.
>>>>>>>>>>>
>>>>>>>>>>> Best regards.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Mar 15, 2016 at 4:14 PM, Mahesh Dananjaya <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Maheshakaya,
>>>>>>>>>>>> Ok.I have been trying some examples and try to split them and
>>>>>>>>>>>> train incrementally. Still doing that. i have been adding them to 
>>>>>>>>>>>> my github
>>>>>>>>>>>> repo too. https://github.com/dananjayamahesh/GSOC2016 . i saw
>>>>>>>>>>>> that there is only scala API support for those streaming 
>>>>>>>>>>>> algorithms in
>>>>>>>>>>>> Spark. so my task is to develop Java API. will let you nkow my
>>>>>>>>>>>> progress.thank you very much.
>>>>>>>>>>>> BR,
>>>>>>>>>>>> Mahesh
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Mar 15, 2016 at 3:21 PM, Maheshakya Wijewardena <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Mahesh,
>>>>>>>>>>>>>
>>>>>>>>>>>>> No you don't need to use Hadoop at any stage in this project.
>>>>>>>>>>>>> Everything you need is in Spark (regarding ML algorithms).
>>>>>>>>>>>>> You can also use Spark MLLibs methods to randomly split
>>>>>>>>>>>>> datasets.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Mar 14, 2016 at 1:28 PM, Mahesh Dananjaya <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Maheshakya,
>>>>>>>>>>>>>> I am writing some java programs and try to break the dataset
>>>>>>>>>>>>>> into several pieces and train a model repeatedly with those data 
>>>>>>>>>>>>>> sets using
>>>>>>>>>>>>>> Spark MLLib. Do i have to do anything with Hadoop at this stage, 
>>>>>>>>>>>>>> because i
>>>>>>>>>>>>>> am working with a standalone mode.thank you.
>>>>>>>>>>>>>> BR,
>>>>>>>>>>>>>> Mahesh.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, Mar 13, 2016 at 6:30 PM, Maheshakya Wijewardena <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Mahesh,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You don't have to look into carbon-ml.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sun, Mar 13, 2016 at 5:49 PM, Mahesh Dananjaya <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi maheshakya,
>>>>>>>>>>>>>>>> i am working on some examples related to Spark and ML.is
>>>>>>>>>>>>>>>> there anything to do with carbon-ml. I think i dont need to 
>>>>>>>>>>>>>>>> look into that
>>>>>>>>>>>>>>>> one.do i?
>>>>>>>>>>>>>>>> BR,
>>>>>>>>>>>>>>>> Mahesh
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Mar 8, 2016 at 11:55 AM, Maheshakya Wijewardena <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Mahesh,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> does that Scala API is with your current product or repo?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> No, we don't have the Scala API included. What we want is
>>>>>>>>>>>>>>>>> to design the Java implementations of those algorithms to 
>>>>>>>>>>>>>>>>> train with
>>>>>>>>>>>>>>>>> mini-batches of streaming data with the help of the 
>>>>>>>>>>>>>>>>> aforementioned methods
>>>>>>>>>>>>>>>>> so that we can include in as a CEP extension.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> As to clarify, please try to write a simple Java program
>>>>>>>>>>>>>>>>> using Spark MLLib linear regression and k-means clustering 
>>>>>>>>>>>>>>>>> with a sample
>>>>>>>>>>>>>>>>> data set (You can find alot of data sets from UCI repo[1]).  
>>>>>>>>>>>>>>>>> You need to
>>>>>>>>>>>>>>>>> break the dataset into several pieces and train a model 
>>>>>>>>>>>>>>>>> repeatedly with
>>>>>>>>>>>>>>>>> those.
>>>>>>>>>>>>>>>>> After each training run, save the model information (such
>>>>>>>>>>>>>>>>> as weights, intercepts for regression and cluster centers for 
>>>>>>>>>>>>>>>>> clustering -
>>>>>>>>>>>>>>>>> please check the arguments of those methods I have mentioned 
>>>>>>>>>>>>>>>>> and save the
>>>>>>>>>>>>>>>>> required information of the model)
>>>>>>>>>>>>>>>>> When training a model we a new piece of data, use those
>>>>>>>>>>>>>>>>> methods to initialize and put the save values for the 
>>>>>>>>>>>>>>>>> arguments. This way
>>>>>>>>>>>>>>>>> you can start from where you stopped in the previous run.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Let us know your observations and feel free to ask if you
>>>>>>>>>>>>>>>>> need to know anything more on this.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We'll let you know what needs to be done to include this
>>>>>>>>>>>>>>>>> in CEP.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Mar 8, 2016 at 10:59 AM, Mahesh Dananjaya <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Maheshakya,
>>>>>>>>>>>>>>>>>> great.thank you.i already have ML and CEP and working
>>>>>>>>>>>>>>>>>> more towards it. does that Scala API is with your current 
>>>>>>>>>>>>>>>>>> product or
>>>>>>>>>>>>>>>>>> repo?.  thank you.
>>>>>>>>>>>>>>>>>> BR,
>>>>>>>>>>>>>>>>>> Mahesh.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sun, Mar 6, 2016 at 5:49 PM, Maheshakya Wijewardena <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Mahesh,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Please find the comments inline.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> does data stream is taken to ML as the event publisher's
>>>>>>>>>>>>>>>>>>>> format through event publisher. Or  we can use direct 
>>>>>>>>>>>>>>>>>>>> traffic that comes to
>>>>>>>>>>>>>>>>>>>> event receiver, or else as streams
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> We intend to use the direct data as even streams.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 1.) Those data coming from wso2 DAS to ML are coming as
>>>>>>>>>>>>>>>>>>>> streams?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> No, WSO2 ML doesn't use any even stream. The data stored
>>>>>>>>>>>>>>>>>>> in tables in DAS is loaded into ML.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2.) Are there any incremental learning algorithms
>>>>>>>>>>>>>>>>>>>> currently active in ML?you mentioned that there are and 
>>>>>>>>>>>>>>>>>>>> they are with scala
>>>>>>>>>>>>>>>>>>>> API. So there is a streaming support with that Scala API. 
>>>>>>>>>>>>>>>>>>>> In that API which
>>>>>>>>>>>>>>>>>>>> format the data is aquired to ML?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> No, there are no incremental learning algorithms in ML.
>>>>>>>>>>>>>>>>>>> The scala API is about Spark MLLib. MLLib supports 
>>>>>>>>>>>>>>>>>>> streaming k-means and
>>>>>>>>>>>>>>>>>>> other generalized linear models (linear regression variants 
>>>>>>>>>>>>>>>>>>> and logistic
>>>>>>>>>>>>>>>>>>> regression) with Scala API. What they basically do in those 
>>>>>>>>>>>>>>>>>>> implementations
>>>>>>>>>>>>>>>>>>> is retraining the trained models with mini batches when 
>>>>>>>>>>>>>>>>>>> data sequentially
>>>>>>>>>>>>>>>>>>> arrives. There, the breaking of streaming data into mini 
>>>>>>>>>>>>>>>>>>> batches is done
>>>>>>>>>>>>>>>>>>> with the help of Spark Streaming. But we do not intend to 
>>>>>>>>>>>>>>>>>>> use Spark
>>>>>>>>>>>>>>>>>>> streaming in our implementation. What we need to do is 
>>>>>>>>>>>>>>>>>>> implement a similar
>>>>>>>>>>>>>>>>>>> behavior for event streams using the Java API.  The Java 
>>>>>>>>>>>>>>>>>>> API has the
>>>>>>>>>>>>>>>>>>> following methods:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>    - *createModel
>>>>>>>>>>>>>>>>>>>    
>>>>>>>>>>>>>>>>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html#createModel%28org.apache.spark.mllib.linalg.Vector,%20double%29>*
>>>>>>>>>>>>>>>>>>>    (Vector
>>>>>>>>>>>>>>>>>>>    
>>>>>>>>>>>>>>>>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/linalg/Vector.html>
>>>>>>>>>>>>>>>>>>>  weights,
>>>>>>>>>>>>>>>>>>>    double intercept) - for GLMs
>>>>>>>>>>>>>>>>>>>    - *setInitialModel
>>>>>>>>>>>>>>>>>>>    
>>>>>>>>>>>>>>>>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeans.html#setInitialModel%28org.apache.spark.mllib.clustering.KMeansModel%29>*
>>>>>>>>>>>>>>>>>>>    (KMeansModel
>>>>>>>>>>>>>>>>>>>    
>>>>>>>>>>>>>>>>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeansModel.html>
>>>>>>>>>>>>>>>>>>>  model)
>>>>>>>>>>>>>>>>>>>    - for K means
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> With the help of these methods, we can train models
>>>>>>>>>>>>>>>>>>> again with newly arriving data, keeping the characteristics 
>>>>>>>>>>>>>>>>>>> learned with
>>>>>>>>>>>>>>>>>>> the previous data. When implementing this, we need to pay 
>>>>>>>>>>>>>>>>>>> attention to
>>>>>>>>>>>>>>>>>>> other parameters of incremental learning such as data 
>>>>>>>>>>>>>>>>>>> horizon and data
>>>>>>>>>>>>>>>>>>> obsolescence (indicated in the project ideas page).
>>>>>>>>>>>>>>>>>>> We need to discuss on how to add these with CEP event
>>>>>>>>>>>>>>>>>>> streams. I have added Suho into the thread for more 
>>>>>>>>>>>>>>>>>>> clarification.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Sat, Mar 5, 2016 at 5:15 PM, Mahesh Dananjaya <
>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi maheshakya,
>>>>>>>>>>>>>>>>>>>> as we concerned to use WSO2 CEP to handle streaming
>>>>>>>>>>>>>>>>>>>> data and implement the machine learning algorithms with 
>>>>>>>>>>>>>>>>>>>> Spark MLLib, does
>>>>>>>>>>>>>>>>>>>> data stream is taken to ML as the event publisher's format 
>>>>>>>>>>>>>>>>>>>> through event
>>>>>>>>>>>>>>>>>>>> publisher. Or  we can use direct traffic that comes to 
>>>>>>>>>>>>>>>>>>>> event receiver, or
>>>>>>>>>>>>>>>>>>>> else as streams. referring to
>>>>>>>>>>>>>>>>>>>> https://docs.wso2.com/display/CEP410/User+Guide
>>>>>>>>>>>>>>>>>>>>     1.) Those data coming from wso2 DAS to ML are
>>>>>>>>>>>>>>>>>>>> coming as streams?
>>>>>>>>>>>>>>>>>>>>     2.) Are there any incremental learning algorithms
>>>>>>>>>>>>>>>>>>>> currently active in ML?you mentioned that there are and 
>>>>>>>>>>>>>>>>>>>> they are with scala
>>>>>>>>>>>>>>>>>>>> API. So there is a streaming support with that Scala API. 
>>>>>>>>>>>>>>>>>>>> In that API which
>>>>>>>>>>>>>>>>>>>> format the data is aquired to ML?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> thank you.
>>>>>>>>>>>>>>>>>>>> BR,
>>>>>>>>>>>>>>>>>>>> Mahesh.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Fri, Mar 4, 2016 at 2:03 PM, Maheshakya Wijewardena
>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hi Mahesh,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> We had to modify a the project scope a little to suit
>>>>>>>>>>>>>>>>>>>>> best for the requirements. We will update the project 
>>>>>>>>>>>>>>>>>>>>> idea with those
>>>>>>>>>>>>>>>>>>>>> concerns soon and let you know.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> We do not support streaming data in WSO2 Machine
>>>>>>>>>>>>>>>>>>>>> learner at the moment. The new concern is to use WSO2 CEP 
>>>>>>>>>>>>>>>>>>>>> to handle
>>>>>>>>>>>>>>>>>>>>> streaming data and implement the machine learning 
>>>>>>>>>>>>>>>>>>>>> algorithms with Spark
>>>>>>>>>>>>>>>>>>>>> MLLib. You can look at the streaming k-means and 
>>>>>>>>>>>>>>>>>>>>> streaming linear
>>>>>>>>>>>>>>>>>>>>> regression implementations in MLLib. Currently, the API 
>>>>>>>>>>>>>>>>>>>>> is only for scala.
>>>>>>>>>>>>>>>>>>>>> Our need is to get the Java APIs of k-means and 
>>>>>>>>>>>>>>>>>>>>> generalized linear models
>>>>>>>>>>>>>>>>>>>>> to support incremental learning with streaming data. This 
>>>>>>>>>>>>>>>>>>>>> has to be done as
>>>>>>>>>>>>>>>>>>>>> mini-batch learning since these algorithms operates as 
>>>>>>>>>>>>>>>>>>>>> stochastic gradient
>>>>>>>>>>>>>>>>>>>>> descents so that any learning with new data can be done 
>>>>>>>>>>>>>>>>>>>>> on top of the
>>>>>>>>>>>>>>>>>>>>> previously learned models. So please go through the those 
>>>>>>>>>>>>>>>>>>>>> APIs[1][2][3] and
>>>>>>>>>>>>>>>>>>>>> try to get an idea.
>>>>>>>>>>>>>>>>>>>>> Also please try to understand how event streams work
>>>>>>>>>>>>>>>>>>>>> in WSO2 CEP [4][5].
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html
>>>>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeans.html
>>>>>>>>>>>>>>>>>>>>> [3]
>>>>>>>>>>>>>>>>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/classification/LogisticRegressionWithSGD.html
>>>>>>>>>>>>>>>>>>>>> [4]
>>>>>>>>>>>>>>>>>>>>> https://docs.wso2.com/display/CEP310/Working+with+Event+Streams
>>>>>>>>>>>>>>>>>>>>> [5]
>>>>>>>>>>>>>>>>>>>>> https://docs.wso2.com/display/CEP310/Working+with+Execution+Plans
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Fri, Mar 4, 2016 at 11:26 AM, Mahesh Dananjaya <
>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi maheshakya,
>>>>>>>>>>>>>>>>>>>>>> give me sometime to go through your ML package. Do
>>>>>>>>>>>>>>>>>>>>>> current product have any stream data support?. i did 
>>>>>>>>>>>>>>>>>>>>>> some university
>>>>>>>>>>>>>>>>>>>>>> projects related to machine learning with 
>>>>>>>>>>>>>>>>>>>>>> regressions,modelling, factor
>>>>>>>>>>>>>>>>>>>>>> analysis, cluster analysis and classification problems 
>>>>>>>>>>>>>>>>>>>>>> (Discriminant
>>>>>>>>>>>>>>>>>>>>>> Analysis) with SVM (Support Vector machines), Neural 
>>>>>>>>>>>>>>>>>>>>>> networks, LS
>>>>>>>>>>>>>>>>>>>>>> classification and ML(Maximum likelihood). give me 
>>>>>>>>>>>>>>>>>>>>>> sometime to see how wso2
>>>>>>>>>>>>>>>>>>>>>> architecture works.then i can come up with good 
>>>>>>>>>>>>>>>>>>>>>> architecture.thank you.
>>>>>>>>>>>>>>>>>>>>>> BR,
>>>>>>>>>>>>>>>>>>>>>> Mahesh.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Wed, Mar 2, 2016 at 2:41 PM, Mahesh Dananjaya <
>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hi Maheshakya,
>>>>>>>>>>>>>>>>>>>>>>> Thank you for the resources. I will go through this
>>>>>>>>>>>>>>>>>>>>>>> and looking forward to this proposed project.Thank you.
>>>>>>>>>>>>>>>>>>>>>>> BR,
>>>>>>>>>>>>>>>>>>>>>>> Mahesh.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Wed, Mar 2, 2016 at 1:52 PM, Maheshakya
>>>>>>>>>>>>>>>>>>>>>>> Wijewardena <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Hi Mahesh,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thank you for the interest for this project.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> We would like to know what type of similar projects
>>>>>>>>>>>>>>>>>>>>>>>> you have worked on. You may have seen that WSO2 
>>>>>>>>>>>>>>>>>>>>>>>> Machine Learner supports
>>>>>>>>>>>>>>>>>>>>>>>> several learning algorithms at the moment[1]. This 
>>>>>>>>>>>>>>>>>>>>>>>> project intends to
>>>>>>>>>>>>>>>>>>>>>>>> leverage the existing algorithms in WSO2 Machine 
>>>>>>>>>>>>>>>>>>>>>>>> Learner to support
>>>>>>>>>>>>>>>>>>>>>>>> streaming data. As an initiative, first you can get an 
>>>>>>>>>>>>>>>>>>>>>>>> idea about what WSO2
>>>>>>>>>>>>>>>>>>>>>>>> Machine Learner does and how it operates. You can 
>>>>>>>>>>>>>>>>>>>>>>>> download WSO2 Machine
>>>>>>>>>>>>>>>>>>>>>>>> Learner from product page[2] and the the source code 
>>>>>>>>>>>>>>>>>>>>>>>> [3]. ML is using
>>>>>>>>>>>>>>>>>>>>>>>> Apache Spark MLLib[4] for its' algorithms so it's 
>>>>>>>>>>>>>>>>>>>>>>>> better to read and
>>>>>>>>>>>>>>>>>>>>>>>> understand what it does as well.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> In order to get an idea about the deliverables and
>>>>>>>>>>>>>>>>>>>>>>>> the scope of this project, try to understand how Spark 
>>>>>>>>>>>>>>>>>>>>>>>> streaming[5] (see
>>>>>>>>>>>>>>>>>>>>>>>> examples) handles streaming data. Also, have a look in 
>>>>>>>>>>>>>>>>>>>>>>>> the streaming
>>>>>>>>>>>>>>>>>>>>>>>> algorithms[6][7] supported by MLLib. There are two 
>>>>>>>>>>>>>>>>>>>>>>>> approaches discussed to
>>>>>>>>>>>>>>>>>>>>>>>> employ incremental learning in ML in the project 
>>>>>>>>>>>>>>>>>>>>>>>> proposals page. These
>>>>>>>>>>>>>>>>>>>>>>>> streaming algorithms can be directly used in the first 
>>>>>>>>>>>>>>>>>>>>>>>> approach. For the
>>>>>>>>>>>>>>>>>>>>>>>> other approach, the your implementation should contain 
>>>>>>>>>>>>>>>>>>>>>>>> a procedure to
>>>>>>>>>>>>>>>>>>>>>>>> create mini batches from streaming data with relevant 
>>>>>>>>>>>>>>>>>>>>>>>> sizes (i.e. a moving
>>>>>>>>>>>>>>>>>>>>>>>> window) and do periodic retraining of the same 
>>>>>>>>>>>>>>>>>>>>>>>> algorithm.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> To start with the project, you will need to come up
>>>>>>>>>>>>>>>>>>>>>>>> with a suitable plan and an architecture first.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Please watch the video referenced in the proposal
>>>>>>>>>>>>>>>>>>>>>>>> (reference: 5). It will help you getting a better idea 
>>>>>>>>>>>>>>>>>>>>>>>> about machine
>>>>>>>>>>>>>>>>>>>>>>>> learning algorithms with streaming data.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Let us know if you need any help with these.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Best regards
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>> https://docs.wso2.com/display/ML110/Machine+Learner+Algorithms
>>>>>>>>>>>>>>>>>>>>>>>> [2] http://wso2.com/products/machine-learner/
>>>>>>>>>>>>>>>>>>>>>>>> [3]
>>>>>>>>>>>>>>>>>>>>>>>> https://docs.wso2.com/display/ML110/Building+from+Source#BuildingfromSource-Downloadingthesourcecheckout
>>>>>>>>>>>>>>>>>>>>>>>> [4]
>>>>>>>>>>>>>>>>>>>>>>>> https://spark.apache.org/docs/1.4.1/mllib-guide.html
>>>>>>>>>>>>>>>>>>>>>>>> [5]
>>>>>>>>>>>>>>>>>>>>>>>> https://spark.apache.org/docs/1.4.1/streaming-programming-guide.html
>>>>>>>>>>>>>>>>>>>>>>>> [6]
>>>>>>>>>>>>>>>>>>>>>>>> https://spark.apache.org/docs/1.4.1/mllib-linear-methods.html#streaming-linear-regression
>>>>>>>>>>>>>>>>>>>>>>>> [7]
>>>>>>>>>>>>>>>>>>>>>>>> https://spark.apache.org/docs/1.4.1/mllib-clustering.html#streaming-k-means
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Mar 2, 2016 at 1:19 PM, Mahesh Dananjaya <
>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>>>>> I am interesting on contribute to proposal 6:
>>>>>>>>>>>>>>>>>>>>>>>>> "Predictive analytic with online data for WSO2 
>>>>>>>>>>>>>>>>>>>>>>>>> Machine Learner" for GSOC2
>>>>>>>>>>>>>>>>>>>>>>>>> this time. Since i have been engaging with some 
>>>>>>>>>>>>>>>>>>>>>>>>> similar projects i think it
>>>>>>>>>>>>>>>>>>>>>>>>> will be a great experience for me. Please let me know 
>>>>>>>>>>>>>>>>>>>>>>>>> what you think and
>>>>>>>>>>>>>>>>>>>>>>>>> what you suggest. I have been going through your 
>>>>>>>>>>>>>>>>>>>>>>>>> documents.thank you.
>>>>>>>>>>>>>>>>>>>>>>>>> regards,
>>>>>>>>>>>>>>>>>>>>>>>>> Mahesh Dananjaya.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>>> Dev mailing list
>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>>>>> http://wso2.org/cgi-bin/mailman/listinfo/dev
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>>>> +94711228855
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>> +94711228855
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>> +94711228855
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>> +94711228855
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>> +94711228855
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>> +94711228855
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>>>>>> [email protected]
>>>>>>>>>>> +94711228855
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>>>> [email protected]
>>>>>>>>> +94711228855
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>>> [email protected]
>>>>>>>> +94711228855
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>> [email protected]
>>>>>>> +94711228855
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Pruthuvi Maheshakya Wijewardena
>>>> [email protected]
>>>> +94711228855
>>>>
>>>>
>>>>
>>>
>>
>
> _______________________________________________
> Dev mailing list
> [email protected]
> http://wso2.org/cgi-bin/mailman/listinfo/dev
>
>


-- 

Thanks & regards,
Nirmal

Team Lead - WSO2 Machine Learner
Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/

_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]

Reply via email to