[
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cohan Sujay Carlos updated OPENNLP-777:
---------------------------------------
Attachment: NaiveBayesCorrectnessTest.java
Yes, that's right Joern. That test is deterministic so the results have to be
the same every single time.
I found the error. The error was a result of a side-effect in the previous
testcase: (NaiveBayesCorrectnessTest).
In the correctness test, in order to mathematically validate the classifier, I
was deliberately hobbling it (turning off the smoothing and instead using
Maximum Likelihood estimators of probability).
I had forgotten to re-enable smoothing in the correctness test, so the output
of the tests came to depend upon the order in which these tests were run.
I have now bracketed each of the correctness tests with functions reenabling
the smoothing.
I also wanted to you let you know that the function that hobbles the classifier
(ml.naivebayes.NaiveBayesMode.setSmoothed(boolean)) has package-level
visibility.
I did that deliberately to ensure that it can only be invoked from code that is
in the same package. The only use of the hobbling function is
testing/validation (no user would really want to hobble the classifier and lose
a few percentage points of accuracy).
The corrected 'correctness test' is attached.
> Naive Bayesian Classifier
> -------------------------
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
> Issue Type: New Feature
> Components: Machine Learning
> Environment: J2SE 1.5 and above
> Reporter: Cohan Sujay Carlos
> Assignee: Tommaso Teofili
> Priority: Minor
> Labels: NBClassifier, bayes, bayesian, classifier, multinomial,
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java,
> D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java,
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch,
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
> topics.train
>
> Original Estimate: 504h
> Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it
> lacks one at present).
> Implementation details: We have a production-hardened piece of Java code for
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that
> we'd like to contribute. The code is Java 1.5 compatible. I'd have to write
> an adapter to make the interface compatible with the ME classifier in
> OpenNLP. I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this
> dated May 19th, 2015.
> <snip>
> Tommaso Teofili via opennlp.apache.org
> to dev
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> </snip>
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)