[ 
https://issues.apache.org/jira/browse/FLINK-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14900541#comment-14900541
 ] 

ASF GitHub Bot commented on FLINK-1719:
---------------------------------------

GitHub user JonathanH5 opened a pull request:

    https://github.com/apache/flink/pull/1156

    Pull Request

    This pull request is related to 
[FLINK-1719](https://issues.apache.org/jira/browse/FLINK-1719).
    Multinomial Naive Bayes was successfully implemented @tillrohrmann and 
different ideas proposed by other authors were incorporated.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JonathanH5/flink pullrequest

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1156.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1156
    
----
commit aec4cf0b378247e479991f5356f169703ab8ee45
Author: Jonathan Hasenburg <[email protected]>
Date:   2015-05-10T21:06:27Z

    Added a first version of the Naive Bayes Classifier, it works.

commit d216a71edcf525b70ce76310ec122b1dcd72c6c6
Author: Jonathan Hasenburg <[email protected]>
Date:   2015-07-08T16:11:43Z

    First steps to convert to new MLL done

commit ab43bb2686afeed31793a4018b40e26a52c8d4c4
Author: Jonathan Hasenburg <[email protected]>
Date:   2015-07-09T15:14:36Z

    Small changes for talk with Till

commit b5952dbf222ccbb18c96d6ab626236fe1505e203
Author: Jonathan Hasenburg <[email protected]>
Date:   2015-07-14T09:35:57Z

    NaiveB now working with new MLL Layout

commit 1e8dfdf7494f5f57289551a434b400a92f35edb3
Author: Jonathan Hasenburg <[email protected]>
Date:   2015-07-15T15:52:25Z

    Removal of old code

commit 2bcee72bf185fde3e32dbb8e3fb0bc3c8fa73c05
Author: Jonathan Hasenburg <[email protected]>
Date:   2015-07-16T13:53:29Z

    Renamed to MultinomialNaiveBayes, improved code comments, created class for 
automatic benchmarking: MultinomalNaiveBayesRuns

commit ddadbb0299f9116bf0c2acb6b11c5f26a4bd9e10
Author: Jonathan Hasenburg <[email protected]>
Date:   2015-07-22T14:55:41Z

    Added the first two possipilities to choose from, enhanced code comments 
and code structure

commit e8f5b7dd1496edd235b6b40d5462077059002ebd
Author: Jonathan Hasenburg <[email protected]>
Date:   2015-07-23T10:45:10Z

    Added Possibility 3 and improved code comments by a lot (kind of done)

commit f7af3e06c95d58b38edae827974b182082b3d22a
Author: Jonathan Hasenburg <[email protected]>
Date:   2015-07-23T12:44:41Z

    Added tests for all possibilities that use data provided by the Collection 
class

commit 42921e9c911fae996bac2aee4dd32a6b0ee7d3e7
Author: Jonathan Hasenburg <[email protected]>
Date:   2015-07-24T09:51:22Z

    Duplicated MultinomialNaiveBayes class and renamed it to 
MultinomialNaiveBayesJoinedModel. Both classes (end tests) do now exactly the 
same (also same line numbers), only the name differs).

commit a5e9c80c2214c178f3ca7c87b6e9e763409f90e0
Author: Jonathan Hasenburg <[email protected]>
Date:   2015-07-24T11:36:09Z

    MultinomialNaiveBayes now stores its data in two different models -> class 
related and word related, results are the same but it seems to be faster than 
MultinomialNaiveBayesJoinedModel, tests already work

commit a8e62cfc46eae87aece575896ea494c02bc48a11
Author: Jonathan Hasenburg <[email protected]>
Date:   2015-07-27T12:04:35Z

    Resolved 404 Scala style errors

commit c90b25d1f8de933400a6a69c307f28cbec317bb5
Author: Jonathan Hasenburg <[email protected]>
Date:   2015-08-11T22:52:49Z

    First incorparation of SR1, only Schneider so far, works but test show that 
accucary for webkb is 10 percent worse

commit 8c15e7c0bc8a014840baa866b51b62edce2846ae
Author: Jonathan Hasenburg <[email protected]>
Date:   2015-08-14T15:53:00Z

    Added SR1 = 2, results seem weird. Also added first code for a Transformer 
that applies feature selection

commit bb46a2951501d9ecbdb3161c306177fde751e770
Author: Jonathan Hasenburg <[email protected]>
Date:   2015-08-19T14:09:10Z

    Improved CRQ and some other things

commit 32c05d8d8860205e4e81f05a193980958a69d1b8
Author: Jonathan Hasenburg <[email protected]>
Date:   2015-08-19T14:12:48Z

    Removed changes for SR1 = 2 from the Fit Operation because nothing needs to 
be changed there

commit ed843c6a95429a8522d71436c97f1ee0a7c8b159
Author: Jonathan Hasenburg <[email protected]>
Date:   2015-08-20T11:21:02Z

    Added SR=1

commit 509184692f4352e5d228897bfde8564a35163d39
Author: Jonathan Hasenburg <[email protected]>
Date:   2015-08-21T15:28:49Z

    Added R1

commit 98d1dee42c77d73e5da32246c6d7bbf9c8ac6f2e
Author: Jonathan Hasenburg <[email protected]>
Date:   2015-08-25T12:44:26Z

    Resolved systematic error when calculation SR1=1, SR1=2 and R1=1

commit dd4acacb18e01aa44e708d84724a51c96a705872
Author: Jonathan Hasenburg <[email protected]>
Date:   2015-08-27T11:50:25Z

    Version I used for testing the theory improvements

commit 06534f0d517219981577f678f8668c90be81bdab
Author: Jonathan Hasenburg <[email protected]>
Date:   2015-08-28T10:55:03Z

    Small changes

commit bab1ecb076e56b8227af84303e22e4beb6751e5c
Author: Jonathan Hasenburg <[email protected]>
Date:   2015-09-21T11:15:49Z

    Join with Huge

commit 2c70bf41f5d5f2e9536ce52995cae1e294776347
Author: Jonathan Hasenburg <[email protected]>
Date:   2015-09-21T11:22:47Z

    Cleanup for pull request

----


> Add naive Bayes classification algorithm to machine learning library
> --------------------------------------------------------------------
>
>                 Key: FLINK-1719
>                 URL: https://issues.apache.org/jira/browse/FLINK-1719
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Till Rohrmann
>            Assignee: Jonathan Hasenburg
>              Labels: ML
>
> Add naive Bayes algorithm to Flink's machine learning library as a basic 
> classification algorithm. Maybe we can incorporate some of the improvements 
> developed by [Karl-Michael 
> Schneider|http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.59.2085&rep=rep1&type=pdf],
>  [Sang-Bum Kim et 
> al.|http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1704799] or 
> [Jason Rennie et 
> al.|http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf] into the 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to