Not exactly. You would create a new NER model to replace yours. In this approach you would need a corpus like this:
<START:personMale> Pierre Vinken <END> , 61 years old , will join the board as a nonexecutive director Nov. 29 . Mr . <START:personMale> Vinken <END> is chairman of Elsevier N.V. , the Dutch publishing group . <START:personFemale> Jessie Robson <END> is retiring , she was a board member for 5 years . I am not an English native speaker, so I am not sure if the example is clear enough. I tried to use Jessie as a neutral name and "she" as disambiguation. With a corpus big enough maybe you could create a model that outputs both classes, personMale and personFemale. To train a model you can follow https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training Let's say your results are not good enough. You could increment your model using Custom Feature Generators ( https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training.featuregen and https://opennlp.apache.org/documentation/1.6.0/apidocs/opennlp-tools/opennlp/tools/util/featuregen/package-summary.html ). One of the implemented featuregen can take a dictionary ( https://opennlp.apache.org/documentation/1.6.0/apidocs/opennlp-tools/opennlp/tools/util/featuregen/DictionaryFeatureGenerator.html ). You can also implement other convenient FeatureGenerator, for instance regex. Again, it is just a wild guess of how to implement it. I don't know if it would perform well. I was only thinking how to implement a gender ML model that uses the surrounding context. Hope I could clarify. William 2016-06-28 19:15 GMT-03:00 Damiano Porta <damianopo...@gmail.com>: > Hi William, > Ok, so you are talking about a kind of pipe where we execute: > > 1. NER (personM for example) > 2. Regex (filter to reduce false positives) > 3. Plain dictionary (filter as above) ? > > Yes we can split out model in two for M and F, it is not a big problem, we > have a database grouped by gender. > > I only have a doubt regarding the use of a dictionary. Because if we use a > dictionary to create the model, we could only use it to detect names > without using NER. No? > > > > 2016-06-29 0:10 GMT+02:00 William Colen <william.co...@gmail.com>: > > > Do you plan to use the surrounding context? If yes, maybe you could try > to > > split NER in two categories: PersonM and PersonF. Just an idea, never > read > > or tried anything like it. You would need a training corpus with these > > classes. > > > > You could add both the plain dictionary and the regex as NER features as > > well and check how it improves. > > > > 2016-06-28 18:56 GMT-03:00 Damiano Porta <damianopo...@gmail.com>: > > > > > Hello everybody, > > > > > > we built a NER model to find persons (name) inside our documents. > > > We are looking for the best approach to understand if the name is > > > male/female. > > > > > > Possible solutions: > > > - Plain dictionary? > > > - Regex to check the initial and/letters of the name? > > > - Classifier? (naive bayes? Maxent?) > > > > > > Thanks > > > > > >