Manoj,

you can add custom feature using a generator that implements this:
https://github.com/apache/opennlp/blob/master/opennlp-tools/src/main/java/opennlp/tools/doccat/FeatureGenerator.java

take a look at
https://github.com/apache/opennlp/blob/master/opennlp-tools/src/main/java/opennlp/tools/doccat/BagOfWordsFeatureGenerator.java
and
https://github.com/apache/opennlp/blob/master/opennlp-tools/src/main/java/opennlp/tools/doccat/NGramFeatureGenerator.java

Damiano

2017-01-18 12:41 GMT+01:00 Cohan Sujay Carlos <co...@aiaioo.com>:

> In machine learning, one learns the weights you're speaking of, Manoj.
>
> So, the words that are more important for any category are given higher
> weightage during classification.
>
> However, rather than requiring a user to manually assign these weights, a
> machine learning system learns the weights from training data.
>
> That's what happens when you call say DocumentCategorizerME.train(*"en"*,
> sampleStream);
>
> The model that the train method returns is just a record of the "weights"
> that have been learnt.
>
> Cohan
>
> On Wed, Jan 18, 2017 at 4:18 PM, Manoj B. Narayanan <
> manojb.narayanan2...@gmail.com> wrote:
>
> > Hi,
> >
> > I was wondering if there is a way to assign weights to certain words of a
> > class in the Document Classifier.
> >
> > Some words are important for a particular class. Even though these words
> > may occur in other classes, the level of importance may vary. So, if
> > certain words in certain classes are given specific weights, it would
> > produce more accurate results.
> >
> > Let me explain this with an example.
> >
> > Say we have 2 classes. Nature and Sports.
> > Consider these 2 sentences :
> >     1. We played basket ball, under the sun.
> >     2. The sun is a big ball of fire.
> >
> > In the first sentence, which belongs to the class 'Sports', the words
> > 'played','basket','ball' are more important than the word 'sun'. Whereas,
> > in the second sentence, the words 'sun' and 'fire' are important than the
> > word 'ball'.
> >
> > Thelevel of importance can be assigned by assigning weight to a few
> > specific words that are distinct for a class.
> >
> > Is there already a way to do this in OpenNLP Document Classifier? If not
> > please consider this.
> >
>

Reply via email to