Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]

Mahesh Dananjaya Thu, 31 Mar 2016 02:20:08 -0700

Hi Maheshakya,
Google have accepted my proof of enrollment. So do i need to proceed
further with the project?t. I have been working with the Spark MLLib and
trying to implement those two algorithms. Can you please tell me what is
the next step i want to do.do i need to wait?thank you.
regards,
Mahesh.


On Fri, Mar 25, 2016 at 10:40 PM, Mahesh Dananjaya <
[email protected]> wrote:

> Hi Maheshakya,
> Thank you very much for the support given during the last couple of
> weeks.I have finally submitted the proposal to the site.And i am looking
> forward to contribute to your wso2 ml.thank you.
> regards,
> Mahesh.
>
> On Fri, Mar 25, 2016 at 7:49 PM, Mahesh Dananjaya <
> [email protected]> wrote:
>
>> Hi maheshakya,
>> i added the timeline according to my knowledge and uploaded.pls
>> check.thank you.
>> regards,
>> Mahesh.
>>
>> On Fri, Mar 25, 2016 at 7:09 PM, Maheshakya Wijewardena <
>> [email protected]> wrote:
>>
>>> Hi Mahesh,
>>>
>>> Can you add the time line of the project as I've mentioned. It's one of
>>> the crucial parts of the proposal that allows us to evaluate feasibility of
>>> the project in accordance with the given time period by Google.
>>>
>>> Best regards.
>>>
>>> On Fri, Mar 25, 2016 at 6:53 PM, Mahesh Dananjaya <
>>> [email protected]> wrote:
>>>
>>>>
>>>> ---------- Forwarded message ----------
>>>> From: Mahesh Dananjaya <[email protected]>
>>>> Date: Fri, Mar 25, 2016 at 7:02 PM
>>>> Subject: Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]
>>>> To: Maheshakya Wijewardena <[email protected]>
>>>>
>>>>
>>>> Hi maheshakya,
>>>> I have uploaded my final submission.here it is. pls check it and inform
>>>> me anything i need to change.thank you.
>>>> BR,
>>>> Mahesh.
>>>>
>>>> On Fri, Mar 25, 2016 at 6:28 PM, Mahesh Dananjaya <
>>>> [email protected]> wrote:
>>>>
>>>>> Hi Maheshakya,
>>>>> thank you very much. I will be updating the proposal with those
>>>>> changes and i will submit it by now.thank you.
>>>>> regards,
>>>>> Mahesh.
>>>>>
>>>>> On Fri, Mar 25, 2016 at 6:07 PM, Maheshakya Wijewardena <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi Mahesh,
>>>>>>
>>>>>> In the title, please include both tags [ML] and [CEP]
>>>>>>
>>>>>> Best regards.
>>>>>>
>>>>>> On Fri, Mar 25, 2016 at 5:49 PM, Maheshakya Wijewardena <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Also, please include an introduction to yourself (University,
>>>>>>> department), past experience in machine learning, language proficiency, 
>>>>>>> etc
>>>>>>> at the beginning of the proposal.
>>>>>>>
>>>>>>> Best regards.
>>>>>>>
>>>>>>> On Fri, Mar 25, 2016 at 5:47 PM, Maheshakya Wijewardena <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi Mahesh,
>>>>>>>>
>>>>>>>> Thank you for sending the draft. Please submit it as soon as
>>>>>>>> possible.
>>>>>>>>
>>>>>>>> Few high level comments:
>>>>>>>>
>>>>>>>> In the proposal, you must specifically mention that this will be
>>>>>>>> implemented as a Siddhi extension that can operate directly on incoming
>>>>>>>> streams.
>>>>>>>>
>>>>>>>> Also, you need to have a time line for the project, A sample looks
>>>>>>>> like:
>>>>>>>>
>>>>>>>> May 1- May 20 - Community bonding period - Getting familiar with
>>>>>>>> the platform and discussing implementation methods.
>>>>>>>> May 20 - May 30 - Implementing streaming k-means,
>>>>>>>> -----
>>>>>>>> -----
>>>>>>>> July 20-24 - Writing examples
>>>>>>>> July 24-18 - Documentation
>>>>>>>>
>>>>>>>> This should end before pencils down date. Refer to the correct time
>>>>>>>> line given in GSoC site.
>>>>>>>>
>>>>>>>> The implementation details of the the streaming algorithms looks
>>>>>>>> fine.
>>>>>>>>
>>>>>>>> Best regards.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Mar 25, 2016 at 5:23 PM, Mahesh Dananjaya <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi Maheshakya,
>>>>>>>>> this is my draft proposal.
>>>>>>>>>
>>>>>>>>> https://docs.google.com/document/d/1apZfEXZXEH5GwSwS7hARINbGw5_zinxWdZjEmyqfKu4/edit?usp=sha
>>>>>>>>> <https://docs.google.com/document/d/1apZfEXZXEH5GwSwS7hARINbGw5_zinxWdZjEmyqfKu4/edit?usp=sharing>
>>>>>>>>> ring
>>>>>>>>> can you ple check this and see whether it is correct.thank you.
>>>>>>>>> BR,
>>>>>>>>> Mahesh
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Mar 21, 2016 at 1:15 PM, Maheshakya Wijewardena <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Mahesh,
>>>>>>>>>>
>>>>>>>>>> The deadline for submitting your proposals is on March 25th,
>>>>>>>>>> 2016, therefore please start writing the proposal and get feedback.
>>>>>>>>>>
>>>>>>>>>> Best regards.
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 15, 2016 at 4:14 PM, Mahesh Dananjaya <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Maheshakaya,
>>>>>>>>>>> Ok.I have been trying some examples and try to split them and
>>>>>>>>>>> train incrementally. Still doing that. i have been adding them to 
>>>>>>>>>>> my github
>>>>>>>>>>> repo too. https://github.com/dananjayamahesh/GSOC2016 . i saw
>>>>>>>>>>> that there is only scala API support for those streaming algorithms 
>>>>>>>>>>> in
>>>>>>>>>>> Spark. so my task is to develop Java API. will let you nkow my
>>>>>>>>>>> progress.thank you very much.
>>>>>>>>>>> BR,
>>>>>>>>>>> Mahesh
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Mar 15, 2016 at 3:21 PM, Maheshakya Wijewardena <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Mahesh,
>>>>>>>>>>>>
>>>>>>>>>>>> No you don't need to use Hadoop at any stage in this project.
>>>>>>>>>>>> Everything you need is in Spark (regarding ML algorithms).
>>>>>>>>>>>> You can also use Spark MLLibs methods to randomly split
>>>>>>>>>>>> datasets.
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Mar 14, 2016 at 1:28 PM, Mahesh Dananjaya <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Maheshakya,
>>>>>>>>>>>>> I am writing some java programs and try to break the dataset
>>>>>>>>>>>>> into several pieces and train a model repeatedly with those data 
>>>>>>>>>>>>> sets using
>>>>>>>>>>>>> Spark MLLib. Do i have to do anything with Hadoop at this stage, 
>>>>>>>>>>>>> because i
>>>>>>>>>>>>> am working with a standalone mode.thank you.
>>>>>>>>>>>>> BR,
>>>>>>>>>>>>> Mahesh.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Mar 13, 2016 at 6:30 PM, Maheshakya Wijewardena <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Mahesh,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You don't have to look into carbon-ml.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, Mar 13, 2016 at 5:49 PM, Mahesh Dananjaya <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi maheshakya,
>>>>>>>>>>>>>>> i am working on some examples related to Spark and ML.is
>>>>>>>>>>>>>>> there anything to do with carbon-ml. I think i dont need to 
>>>>>>>>>>>>>>> look into that
>>>>>>>>>>>>>>> one.do i?
>>>>>>>>>>>>>>> BR,
>>>>>>>>>>>>>>> Mahesh
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Mar 8, 2016 at 11:55 AM, Maheshakya Wijewardena <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Mahesh,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> does that Scala API is with your current product or repo?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> No, we don't have the Scala API included. What we want is
>>>>>>>>>>>>>>>> to design the Java implementations of those algorithms to 
>>>>>>>>>>>>>>>> train with
>>>>>>>>>>>>>>>> mini-batches of streaming data with the help of the 
>>>>>>>>>>>>>>>> aforementioned methods
>>>>>>>>>>>>>>>> so that we can include in as a CEP extension.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> As to clarify, please try to write a simple Java program
>>>>>>>>>>>>>>>> using Spark MLLib linear regression and k-means clustering 
>>>>>>>>>>>>>>>> with a sample
>>>>>>>>>>>>>>>> data set (You can find alot of data sets from UCI repo[1]).  
>>>>>>>>>>>>>>>> You need to
>>>>>>>>>>>>>>>> break the dataset into several pieces and train a model 
>>>>>>>>>>>>>>>> repeatedly with
>>>>>>>>>>>>>>>> those.
>>>>>>>>>>>>>>>> After each training run, save the model information (such
>>>>>>>>>>>>>>>> as weights, intercepts for regression and cluster centers for 
>>>>>>>>>>>>>>>> clustering -
>>>>>>>>>>>>>>>> please check the arguments of those methods I have mentioned 
>>>>>>>>>>>>>>>> and save the
>>>>>>>>>>>>>>>> required information of the model)
>>>>>>>>>>>>>>>> When training a model we a new piece of data, use those
>>>>>>>>>>>>>>>> methods to initialize and put the save values for the 
>>>>>>>>>>>>>>>> arguments. This way
>>>>>>>>>>>>>>>> you can start from where you stopped in the previous run.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Let us know your observations and feel free to ask if you
>>>>>>>>>>>>>>>> need to know anything more on this.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> We'll let you know what needs to be done to include this in
>>>>>>>>>>>>>>>> CEP.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Mar 8, 2016 at 10:59 AM, Mahesh Dananjaya <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Maheshakya,
>>>>>>>>>>>>>>>>> great.thank you.i already have ML and CEP and working more
>>>>>>>>>>>>>>>>> towards it. does that Scala API is with your current product 
>>>>>>>>>>>>>>>>> or repo?.
>>>>>>>>>>>>>>>>> thank you.
>>>>>>>>>>>>>>>>> BR,
>>>>>>>>>>>>>>>>> Mahesh.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sun, Mar 6, 2016 at 5:49 PM, Maheshakya Wijewardena <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Mahesh,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Please find the comments inline.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> does data stream is taken to ML as the event publisher's
>>>>>>>>>>>>>>>>>>> format through event publisher. Or  we can use direct 
>>>>>>>>>>>>>>>>>>> traffic that comes to
>>>>>>>>>>>>>>>>>>> event receiver, or else as streams
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We intend to use the direct data as even streams.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1.) Those data coming from wso2 DAS to ML are coming as
>>>>>>>>>>>>>>>>>>> streams?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> No, WSO2 ML doesn't use any even stream. The data stored
>>>>>>>>>>>>>>>>>> in tables in DAS is loaded into ML.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2.) Are there any incremental learning algorithms
>>>>>>>>>>>>>>>>>>> currently active in ML?you mentioned that there are and 
>>>>>>>>>>>>>>>>>>> they are with scala
>>>>>>>>>>>>>>>>>>> API. So there is a streaming support with that Scala API. 
>>>>>>>>>>>>>>>>>>> In that API which
>>>>>>>>>>>>>>>>>>> format the data is aquired to ML?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> No, there are no incremental learning algorithms in ML.
>>>>>>>>>>>>>>>>>> The scala API is about Spark MLLib. MLLib supports streaming 
>>>>>>>>>>>>>>>>>> k-means and
>>>>>>>>>>>>>>>>>> other generalized linear models (linear regression variants 
>>>>>>>>>>>>>>>>>> and logistic
>>>>>>>>>>>>>>>>>> regression) with Scala API. What they basically do in those 
>>>>>>>>>>>>>>>>>> implementations
>>>>>>>>>>>>>>>>>> is retraining the trained models with mini batches when data 
>>>>>>>>>>>>>>>>>> sequentially
>>>>>>>>>>>>>>>>>> arrives. There, the breaking of streaming data into mini 
>>>>>>>>>>>>>>>>>> batches is done
>>>>>>>>>>>>>>>>>> with the help of Spark Streaming. But we do not intend to 
>>>>>>>>>>>>>>>>>> use Spark
>>>>>>>>>>>>>>>>>> streaming in our implementation. What we need to do is 
>>>>>>>>>>>>>>>>>> implement a similar
>>>>>>>>>>>>>>>>>> behavior for event streams using the Java API.  The Java API 
>>>>>>>>>>>>>>>>>> has the
>>>>>>>>>>>>>>>>>> following methods:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>    - *createModel
>>>>>>>>>>>>>>>>>>    
>>>>>>>>>>>>>>>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html#createModel%28org.apache.spark.mllib.linalg.Vector,%20double%29>*
>>>>>>>>>>>>>>>>>>    (Vector
>>>>>>>>>>>>>>>>>>    
>>>>>>>>>>>>>>>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/linalg/Vector.html>
>>>>>>>>>>>>>>>>>>  weights,
>>>>>>>>>>>>>>>>>>    double intercept) - for GLMs
>>>>>>>>>>>>>>>>>>    - *setInitialModel
>>>>>>>>>>>>>>>>>>    
>>>>>>>>>>>>>>>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeans.html#setInitialModel%28org.apache.spark.mllib.clustering.KMeansModel%29>*
>>>>>>>>>>>>>>>>>>    (KMeansModel
>>>>>>>>>>>>>>>>>>    
>>>>>>>>>>>>>>>>>> <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeansModel.html>
>>>>>>>>>>>>>>>>>>  model)
>>>>>>>>>>>>>>>>>>    - for K means
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> With the help of these methods, we can train models again
>>>>>>>>>>>>>>>>>> with newly arriving data, keeping the characteristics 
>>>>>>>>>>>>>>>>>> learned with the
>>>>>>>>>>>>>>>>>> previous data. When implementing this, we need to pay 
>>>>>>>>>>>>>>>>>> attention to other
>>>>>>>>>>>>>>>>>> parameters of incremental learning such as data horizon and 
>>>>>>>>>>>>>>>>>> data
>>>>>>>>>>>>>>>>>> obsolescence (indicated in the project ideas page).
>>>>>>>>>>>>>>>>>> We need to discuss on how to add these with CEP event
>>>>>>>>>>>>>>>>>> streams. I have added Suho into the thread for more 
>>>>>>>>>>>>>>>>>> clarification.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sat, Mar 5, 2016 at 5:15 PM, Mahesh Dananjaya <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi maheshakya,
>>>>>>>>>>>>>>>>>>> as we concerned to use WSO2 CEP to handle streaming data
>>>>>>>>>>>>>>>>>>> and implement the machine learning algorithms with Spark 
>>>>>>>>>>>>>>>>>>> MLLib, does data
>>>>>>>>>>>>>>>>>>> stream is taken to ML as the event publisher's format 
>>>>>>>>>>>>>>>>>>> through event
>>>>>>>>>>>>>>>>>>> publisher. Or  we can use direct traffic that comes to 
>>>>>>>>>>>>>>>>>>> event receiver, or
>>>>>>>>>>>>>>>>>>> else as streams. referring to
>>>>>>>>>>>>>>>>>>> https://docs.wso2.com/display/CEP410/User+Guide
>>>>>>>>>>>>>>>>>>>     1.) Those data coming from wso2 DAS to ML are coming
>>>>>>>>>>>>>>>>>>> as streams?
>>>>>>>>>>>>>>>>>>>     2.) Are there any incremental learning algorithms
>>>>>>>>>>>>>>>>>>> currently active in ML?you mentioned that there are and 
>>>>>>>>>>>>>>>>>>> they are with scala
>>>>>>>>>>>>>>>>>>> API. So there is a streaming support with that Scala API. 
>>>>>>>>>>>>>>>>>>> In that API which
>>>>>>>>>>>>>>>>>>> format the data is aquired to ML?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> thank you.
>>>>>>>>>>>>>>>>>>> BR,
>>>>>>>>>>>>>>>>>>> Mahesh.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Fri, Mar 4, 2016 at 2:03 PM, Maheshakya Wijewardena <
>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi Mahesh,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> We had to modify a the project scope a little to suit
>>>>>>>>>>>>>>>>>>>> best for the requirements. We will update the project idea 
>>>>>>>>>>>>>>>>>>>> with those
>>>>>>>>>>>>>>>>>>>> concerns soon and let you know.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> We do not support streaming data in WSO2 Machine
>>>>>>>>>>>>>>>>>>>> learner at the moment. The new concern is to use WSO2 CEP 
>>>>>>>>>>>>>>>>>>>> to handle
>>>>>>>>>>>>>>>>>>>> streaming data and implement the machine learning 
>>>>>>>>>>>>>>>>>>>> algorithms with Spark
>>>>>>>>>>>>>>>>>>>> MLLib. You can look at the streaming k-means and streaming 
>>>>>>>>>>>>>>>>>>>> linear
>>>>>>>>>>>>>>>>>>>> regression implementations in MLLib. Currently, the API is 
>>>>>>>>>>>>>>>>>>>> only for scala.
>>>>>>>>>>>>>>>>>>>> Our need is to get the Java APIs of k-means and 
>>>>>>>>>>>>>>>>>>>> generalized linear models
>>>>>>>>>>>>>>>>>>>> to support incremental learning with streaming data. This 
>>>>>>>>>>>>>>>>>>>> has to be done as
>>>>>>>>>>>>>>>>>>>> mini-batch learning since these algorithms operates as 
>>>>>>>>>>>>>>>>>>>> stochastic gradient
>>>>>>>>>>>>>>>>>>>> descents so that any learning with new data can be done on 
>>>>>>>>>>>>>>>>>>>> top of the
>>>>>>>>>>>>>>>>>>>> previously learned models. So please go through the those 
>>>>>>>>>>>>>>>>>>>> APIs[1][2][3] and
>>>>>>>>>>>>>>>>>>>> try to get an idea.
>>>>>>>>>>>>>>>>>>>> Also please try to understand how event streams work in
>>>>>>>>>>>>>>>>>>>> WSO2 CEP [4][5].
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html
>>>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/clustering/KMeans.html
>>>>>>>>>>>>>>>>>>>> [3]
>>>>>>>>>>>>>>>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/classification/LogisticRegressionWithSGD.html
>>>>>>>>>>>>>>>>>>>> [4]
>>>>>>>>>>>>>>>>>>>> https://docs.wso2.com/display/CEP310/Working+with+Event+Streams
>>>>>>>>>>>>>>>>>>>> [5]
>>>>>>>>>>>>>>>>>>>> https://docs.wso2.com/display/CEP310/Working+with+Execution+Plans
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Fri, Mar 4, 2016 at 11:26 AM, Mahesh Dananjaya <
>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hi maheshakya,
>>>>>>>>>>>>>>>>>>>>> give me sometime to go through your ML package. Do
>>>>>>>>>>>>>>>>>>>>> current product have any stream data support?. i did some 
>>>>>>>>>>>>>>>>>>>>> university
>>>>>>>>>>>>>>>>>>>>> projects related to machine learning with 
>>>>>>>>>>>>>>>>>>>>> regressions,modelling, factor
>>>>>>>>>>>>>>>>>>>>> analysis, cluster analysis and classification problems 
>>>>>>>>>>>>>>>>>>>>> (Discriminant
>>>>>>>>>>>>>>>>>>>>> Analysis) with SVM (Support Vector machines), Neural 
>>>>>>>>>>>>>>>>>>>>> networks, LS
>>>>>>>>>>>>>>>>>>>>> classification and ML(Maximum likelihood). give me 
>>>>>>>>>>>>>>>>>>>>> sometime to see how wso2
>>>>>>>>>>>>>>>>>>>>> architecture works.then i can come up with good 
>>>>>>>>>>>>>>>>>>>>> architecture.thank you.
>>>>>>>>>>>>>>>>>>>>> BR,
>>>>>>>>>>>>>>>>>>>>> Mahesh.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Wed, Mar 2, 2016 at 2:41 PM, Mahesh Dananjaya <
>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi Maheshakya,
>>>>>>>>>>>>>>>>>>>>>> Thank you for the resources. I will go through this
>>>>>>>>>>>>>>>>>>>>>> and looking forward to this proposed project.Thank you.
>>>>>>>>>>>>>>>>>>>>>> BR,
>>>>>>>>>>>>>>>>>>>>>> Mahesh.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Wed, Mar 2, 2016 at 1:52 PM, Maheshakya
>>>>>>>>>>>>>>>>>>>>>> Wijewardena <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hi Mahesh,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thank you for the interest for this project.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> We would like to know what type of similar projects
>>>>>>>>>>>>>>>>>>>>>>> you have worked on. You may have seen that WSO2 Machine 
>>>>>>>>>>>>>>>>>>>>>>> Learner supports
>>>>>>>>>>>>>>>>>>>>>>> several learning algorithms at the moment[1]. This 
>>>>>>>>>>>>>>>>>>>>>>> project intends to
>>>>>>>>>>>>>>>>>>>>>>> leverage the existing algorithms in WSO2 Machine 
>>>>>>>>>>>>>>>>>>>>>>> Learner to support
>>>>>>>>>>>>>>>>>>>>>>> streaming data. As an initiative, first you can get an 
>>>>>>>>>>>>>>>>>>>>>>> idea about what WSO2
>>>>>>>>>>>>>>>>>>>>>>> Machine Learner does and how it operates. You can 
>>>>>>>>>>>>>>>>>>>>>>> download WSO2 Machine
>>>>>>>>>>>>>>>>>>>>>>> Learner from product page[2] and the the source code 
>>>>>>>>>>>>>>>>>>>>>>> [3]. ML is using
>>>>>>>>>>>>>>>>>>>>>>> Apache Spark MLLib[4] for its' algorithms so it's 
>>>>>>>>>>>>>>>>>>>>>>> better to read and
>>>>>>>>>>>>>>>>>>>>>>> understand what it does as well.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> In order to get an idea about the deliverables and
>>>>>>>>>>>>>>>>>>>>>>> the scope of this project, try to understand how Spark 
>>>>>>>>>>>>>>>>>>>>>>> streaming[5] (see
>>>>>>>>>>>>>>>>>>>>>>> examples) handles streaming data. Also, have a look in 
>>>>>>>>>>>>>>>>>>>>>>> the streaming
>>>>>>>>>>>>>>>>>>>>>>> algorithms[6][7] supported by MLLib. There are two 
>>>>>>>>>>>>>>>>>>>>>>> approaches discussed to
>>>>>>>>>>>>>>>>>>>>>>> employ incremental learning in ML in the project 
>>>>>>>>>>>>>>>>>>>>>>> proposals page. These
>>>>>>>>>>>>>>>>>>>>>>> streaming algorithms can be directly used in the first 
>>>>>>>>>>>>>>>>>>>>>>> approach. For the
>>>>>>>>>>>>>>>>>>>>>>> other approach, the your implementation should contain 
>>>>>>>>>>>>>>>>>>>>>>> a procedure to
>>>>>>>>>>>>>>>>>>>>>>> create mini batches from streaming data with relevant 
>>>>>>>>>>>>>>>>>>>>>>> sizes (i.e. a moving
>>>>>>>>>>>>>>>>>>>>>>> window) and do periodic retraining of the same 
>>>>>>>>>>>>>>>>>>>>>>> algorithm.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> To start with the project, you will need to come up
>>>>>>>>>>>>>>>>>>>>>>> with a suitable plan and an architecture first.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Please watch the video referenced in the proposal
>>>>>>>>>>>>>>>>>>>>>>> (reference: 5). It will help you getting a better idea 
>>>>>>>>>>>>>>>>>>>>>>> about machine
>>>>>>>>>>>>>>>>>>>>>>> learning algorithms with streaming data.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Let us know if you need any help with these.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Best regards
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>> https://docs.wso2.com/display/ML110/Machine+Learner+Algorithms
>>>>>>>>>>>>>>>>>>>>>>> [2] http://wso2.com/products/machine-learner/
>>>>>>>>>>>>>>>>>>>>>>> [3]
>>>>>>>>>>>>>>>>>>>>>>> https://docs.wso2.com/display/ML110/Building+from+Source#BuildingfromSource-Downloadingthesourcecheckout
>>>>>>>>>>>>>>>>>>>>>>> [4]
>>>>>>>>>>>>>>>>>>>>>>> https://spark.apache.org/docs/1.4.1/mllib-guide.html
>>>>>>>>>>>>>>>>>>>>>>> [5]
>>>>>>>>>>>>>>>>>>>>>>> https://spark.apache.org/docs/1.4.1/streaming-programming-guide.html
>>>>>>>>>>>>>>>>>>>>>>> [6]
>>>>>>>>>>>>>>>>>>>>>>> https://spark.apache.org/docs/1.4.1/mllib-linear-methods.html#streaming-linear-regression
>>>>>>>>>>>>>>>>>>>>>>> [7]
>>>>>>>>>>>>>>>>>>>>>>> https://spark.apache.org/docs/1.4.1/mllib-clustering.html#streaming-k-means
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Wed, Mar 2, 2016 at 1:19 PM, Mahesh Dananjaya <
>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>>>> I am interesting on contribute to proposal 6:
>>>>>>>>>>>>>>>>>>>>>>>> "Predictive analytic with online data for WSO2 Machine 
>>>>>>>>>>>>>>>>>>>>>>>> Learner" for GSOC2
>>>>>>>>>>>>>>>>>>>>>>>> this time. Since i have been engaging with some 
>>>>>>>>>>>>>>>>>>>>>>>> similar projects i think it
>>>>>>>>>>>>>>>>>>>>>>>> will be a great experience for me. Please let me know 
>>>>>>>>>>>>>>>>>>>>>>>> what you think and
>>>>>>>>>>>>>>>>>>>>>>>> what you suggest. I have been going through your 
>>>>>>>>>>>>>>>>>>>>>>>> documents.thank you.
>>>>>>>>>>>>>>>>>>>>>>>> regards,
>>>>>>>>>>>>>>>>>>>>>>>> Mahesh Dananjaya.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>>> Dev mailing list
>>>>>>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>>>> http://wso2.org/cgi-bin/mailman/listinfo/dev
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>>> +94711228855
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>> +94711228855
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>> +94711228855
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>> +94711228855
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>> +94711228855
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>>>>>>> [email protected]
>>>>>>>>>>>> +94711228855
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>>>>> [email protected]
>>>>>>>>>> +94711228855
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>>> [email protected]
>>>>>>>> +94711228855
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>>> [email protected]
>>>>>>> +94711228855
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Pruthuvi Maheshakya Wijewardena
>>>>>> [email protected]
>>>>>> +94711228855
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Pruthuvi Maheshakya Wijewardena
>>> [email protected]
>>> +94711228855
>>>
>>>
>>>
>>
>

_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] Fwd: GSOC2016: Proposal 6: [ML]

Reply via email to