Thank you William! Really appreciated! I only do not get one point, when you said "You could increment your model using Custom Feature Generators" does it mean that i can "put" these features inside ONE *.bin* file (model) that implement different things, or, name finder is one thing and those feature generators other?
Thank you in advance for the clarification. 2016-06-29 1:23 GMT+02:00 William Colen <[email protected]>: > Not exactly. You would create a new NER model to replace yours. > > In this approach you would need a corpus like this: > > <START:personMale> Pierre Vinken <END> , 61 years old , will join the board > as a nonexecutive director Nov. 29 . > Mr . <START:personMale> Vinken <END> is chairman of Elsevier N.V. , the > Dutch publishing group . <START:personFemale> Jessie Robson <END> is > retiring , she was a board member for 5 years . > > > I am not an English native speaker, so I am not sure if the example is > clear enough. I tried to use Jessie as a neutral name and "she" as > disambiguation. > > With a corpus big enough maybe you could create a model that outputs both > classes, personMale and personFemale. To train a model you can follow > > https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training > > Let's say your results are not good enough. You could increment your model > using Custom Feature Generators ( > > https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training.featuregen > and > > https://opennlp.apache.org/documentation/1.6.0/apidocs/opennlp-tools/opennlp/tools/util/featuregen/package-summary.html > ). > > One of the implemented featuregen can take a dictionary ( > > https://opennlp.apache.org/documentation/1.6.0/apidocs/opennlp-tools/opennlp/tools/util/featuregen/DictionaryFeatureGenerator.html > ). > You can also implement other convenient FeatureGenerator, for instance > regex. > > Again, it is just a wild guess of how to implement it. I don't know if it > would perform well. I was only thinking how to implement a gender ML model > that uses the surrounding context. > > Hope I could clarify. > > William > > 2016-06-28 19:15 GMT-03:00 Damiano Porta <[email protected]>: > > > Hi William, > > Ok, so you are talking about a kind of pipe where we execute: > > > > 1. NER (personM for example) > > 2. Regex (filter to reduce false positives) > > 3. Plain dictionary (filter as above) ? > > > > Yes we can split out model in two for M and F, it is not a big problem, > we > > have a database grouped by gender. > > > > I only have a doubt regarding the use of a dictionary. Because if we use > a > > dictionary to create the model, we could only use it to detect names > > without using NER. No? > > > > > > > > 2016-06-29 0:10 GMT+02:00 William Colen <[email protected]>: > > > > > Do you plan to use the surrounding context? If yes, maybe you could try > > to > > > split NER in two categories: PersonM and PersonF. Just an idea, never > > read > > > or tried anything like it. You would need a training corpus with these > > > classes. > > > > > > You could add both the plain dictionary and the regex as NER features > as > > > well and check how it improves. > > > > > > 2016-06-28 18:56 GMT-03:00 Damiano Porta <[email protected]>: > > > > > > > Hello everybody, > > > > > > > > we built a NER model to find persons (name) inside our documents. > > > > We are looking for the best approach to understand if the name is > > > > male/female. > > > > > > > > Possible solutions: > > > > - Plain dictionary? > > > > - Regex to check the initial and/letters of the name? > > > > - Classifier? (naive bayes? Maxent?) > > > > > > > > Thanks > > > > > > > > > >
