[
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738878#comment-14738878
]
Tommaso Teofili commented on OPENNLP-777:
-----------------------------------------
[~cohan.sujay] I am writing some tests around model IO (persist to and read
from file) but I am not sure if I am doing something wrong or there's a bug
there.
If you try the two tests below they'll both fail at reading the model written
to file:
{code}
@Test
public void testBinaryModelPersistence() throws Exception {
NaiveBayesModel model = (NaiveBayesModel)new
NaiveBayesTrainer().trainModel(new TwoPassDataIndexer(
NaiveBayesCorrectnessTest.createTrainingStream(), 1, false));
Path path = Paths.get(getClass().getResource("/").getFile());
Path tempFile = Files.createTempFile(path, "bnb-", ".bin");
File file = tempFile.toFile();
GenericModelWriter modelWriter = new GenericModelWriter(model, file);
modelWriter.persist();
NaiveBayesModelReader reader = new NaiveBayesModelReader(file);
reader.checkModelType();
AbstractModel abstractModel = reader.getModel();
assertNotNull(abstractModel);
}
@Test
public void testTextModelPersistence() throws Exception {
NaiveBayesModel model = (NaiveBayesModel)new
NaiveBayesTrainer().trainModel(new TwoPassDataIndexer(
NaiveBayesCorrectnessTest.createTrainingStream(), 1, false));
Path path = Paths.get(getClass().getResource("/").getFile());
Path tempFile = Files.createTempFile(path, "ptnb-", ".txt");
File file = tempFile.toFile();
GenericModelWriter modelWriter = new GenericModelWriter(model, file);
modelWriter.persist();
NaiveBayesModelReader reader = new NaiveBayesModelReader(file);
reader.checkModelType();
AbstractModel abstractModel = reader.getModel();
assertNotNull(abstractModel);
}
{code}
> Naive Bayesian Classifier
> -------------------------
>
> Key: OPENNLP-777
> URL: https://issues.apache.org/jira/browse/OPENNLP-777
> Project: OpenNLP
> Issue Type: New Feature
> Components: Machine Learning
> Environment: J2SE 1.5 and above
> Reporter: Cohan Sujay Carlos
> Assignee: Tommaso Teofili
> Priority: Minor
> Labels: NBClassifier, bayes, bayesian, classifier, multinomial,
> naive, patch
> Attachments: D1TopicClassifierTrainingDemoNB.java,
> D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java,
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch,
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
> topics.train
>
> Original Estimate: 504h
> Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it
> lacks one at present).
> Implementation details: We have a production-hardened piece of Java code for
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that
> we'd like to contribute. The code is Java 1.5 compatible. I'd have to write
> an adapter to make the interface compatible with the ME classifier in
> OpenNLP. I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this
> dated May 19th, 2015.
> <snip>
> Tommaso Teofili via opennlp.apache.org
> to dev
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> </snip>
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)