[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

Tommaso Teofili (JIRA) Thu, 10 Sep 2015 08:02:06 -0700

    [ 
https://issues.apache.org/jira/browse/OPENNLP-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738878#comment-14738878
 ]


Tommaso Teofili commented on OPENNLP-777:
-----------------------------------------

[~cohan.sujay] I am writing some tests around model IO (persist to and read 
from file) but I am not sure if I am doing something wrong or there's a bug 
there.
If you try the two tests below they'll both fail at reading the model written 
to file:
{code}

@Test
  public void testBinaryModelPersistence() throws Exception {
    NaiveBayesModel model = (NaiveBayesModel)new 
NaiveBayesTrainer().trainModel(new TwoPassDataIndexer(
        NaiveBayesCorrectnessTest.createTrainingStream(), 1, false));

    Path path = Paths.get(getClass().getResource("/").getFile());
    Path tempFile = Files.createTempFile(path, "bnb-", ".bin");
    File file = tempFile.toFile();
    GenericModelWriter modelWriter = new GenericModelWriter(model, file);
    modelWriter.persist();

    NaiveBayesModelReader reader = new NaiveBayesModelReader(file);
    reader.checkModelType();
    AbstractModel abstractModel = reader.getModel();
    assertNotNull(abstractModel);
  }

  @Test
  public void testTextModelPersistence() throws Exception {
    NaiveBayesModel model = (NaiveBayesModel)new 
NaiveBayesTrainer().trainModel(new TwoPassDataIndexer(
        NaiveBayesCorrectnessTest.createTrainingStream(), 1, false));

    Path path = Paths.get(getClass().getResource("/").getFile());
    Path tempFile = Files.createTempFile(path, "ptnb-", ".txt");
    File file = tempFile.toFile();
    GenericModelWriter modelWriter = new GenericModelWriter(model, file);
    modelWriter.persist();

    NaiveBayesModelReader reader = new NaiveBayesModelReader(file);
    reader.checkModelType();
    AbstractModel abstractModel = reader.getModel();
    assertNotNull(abstractModel);
  }

{code}

> Naive Bayesian Classifier
> -------------------------
>
>                 Key: OPENNLP-777
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-777
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Machine Learning
>         Environment: J2SE 1.5 and above
>            Reporter: Cohan Sujay Carlos
>            Assignee: Tommaso Teofili
>            Priority: Minor
>              Labels: NBClassifier, bayes, bayesian, classifier, multinomial, 
> naive, patch
>         Attachments: D1TopicClassifierTrainingDemoNB.java, 
> D1TopicClassifierUsageDemoNB.java, NaiveBayesCorrectnessTest.java, 
> naive-bayesian-classifier-for-opennlp-1.6.0-rc6-with-test-cases.patch, 
> prep-attach-test-case-for-naive-bayesian-classifier-for-opennlp-1.6.0-rc6.patch,
>  topics.train
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I thought it would be nice to have a Naive Bayesian classifier in OpenNLP (it 
> lacks one at present).
> Implementation details:  We have a production-hardened piece of Java code for 
> a multinomial Naive Bayesian classifier (with default Laplace smoothing) that 
> we'd like to contribute.  The code is Java 1.5 compatible.  I'd have to write 
> an adapter to make the interface compatible with the ME classifier in 
> OpenNLP.  I expect the patch to be available 1 to 3 weeks from now.
> Below is the email trail of a discussion in the dev mailing list around this 
> dated May 19th, 2015.
> <snip>
> Tommaso Teofili via opennlp.apache.org 
> to dev 
> Hi Cohan,
> I think that'd be a very valuable contribution, as NB is one of the
> foundation algorithms, often used as basis for comparisons.
> It would be good if you could create a Jira issue and provide more details
> about the implementation and, eventually, a patch.
> Thanks and regards,
> Tommaso
> </snip>
> 2015-05-19 9:57 GMT+02:00 Cohan Sujay Carlos 
> > I have a question for the OpenNLP project team.
> >
> > I was wondering if there is a Naive Bayesian classifier implementation in
> > OpenNLP that I've not come across, or if there are plans to implement one.
> >
> > If it is the latter, I should love to contribute an implementation.
> >
> > There is an ME classifier already available in OpenNLP, of course, but I
> > felt that there was an unmet need for a Naive Bayesian (NB) classifier
> > implementation to be offered as well.
> >
> > An NB classifier could be bootstrapped up with partially labelled training
> > data as explained in the Nigam, McCallum, et al paper of 2000 "Text
> > Classification from Labeled and Unlabeled Documents using EM".
> >
> > So, if there isn't an NB code base out there already, I'd be happy to
> > contribute a very solid implementation that we've used in production for a
> > good 5 years.
> >
> > I'd have to adapt it to load the same training data format as the ME
> > classifier, but I guess that shouldn't be very difficult to do.
> >
> > I was wondering if there was some interest in adding an NB implementation
> > and I'd love to know who could I coordinate with if there is?
> >
> > Cohan Sujay Carlos
> > CEO, Aiaioo Labs, India



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (OPENNLP-777) Naive Bayesian Classifier

Reply via email to