Hi Damiano, Thank you. I will definitely look into it.
Manoj. On Wed, Jan 18, 2017 at 5:30 PM, Damiano Porta <damianopo...@gmail.com> wrote: > Manoj, > > you can add custom feature using a generator that implements this: > https://github.com/apache/opennlp/blob/master/opennlp- > tools/src/main/java/opennlp/tools/doccat/FeatureGenerator.java > > take a look at > https://github.com/apache/opennlp/blob/master/opennlp- > tools/src/main/java/opennlp/tools/doccat/BagOfWordsFeatureGenerator.java > and > https://github.com/apache/opennlp/blob/master/opennlp- > tools/src/main/java/opennlp/tools/doccat/NGramFeatureGenerator.java > > Damiano > > 2017-01-18 12:41 GMT+01:00 Cohan Sujay Carlos <co...@aiaioo.com>: > > > In machine learning, one learns the weights you're speaking of, Manoj. > > > > So, the words that are more important for any category are given higher > > weightage during classification. > > > > However, rather than requiring a user to manually assign these weights, a > > machine learning system learns the weights from training data. > > > > That's what happens when you call say DocumentCategorizerME.train(*" > en"*, > > sampleStream); > > > > The model that the train method returns is just a record of the "weights" > > that have been learnt. > > > > Cohan > > > > On Wed, Jan 18, 2017 at 4:18 PM, Manoj B. Narayanan < > > manojb.narayanan2...@gmail.com> wrote: > > > > > Hi, > > > > > > I was wondering if there is a way to assign weights to certain words > of a > > > class in the Document Classifier. > > > > > > Some words are important for a particular class. Even though these > words > > > may occur in other classes, the level of importance may vary. So, if > > > certain words in certain classes are given specific weights, it would > > > produce more accurate results. > > > > > > Let me explain this with an example. > > > > > > Say we have 2 classes. Nature and Sports. > > > Consider these 2 sentences : > > > 1. We played basket ball, under the sun. > > > 2. The sun is a big ball of fire. > > > > > > In the first sentence, which belongs to the class 'Sports', the words > > > 'played','basket','ball' are more important than the word 'sun'. > Whereas, > > > in the second sentence, the words 'sun' and 'fire' are important than > the > > > word 'ball'. > > > > > > Thelevel of importance can be assigned by assigning weight to a few > > > specific words that are distinct for a class. > > > > > > Is there already a way to do this in OpenNLP Document Classifier? If > not > > > please consider this. > > > > > >