Re: Model to detect the gender

William Colen Tue, 28 Jun 2016 16:24:17 -0700

Not exactly. You would create a new NER model to replace yours.

In this approach you would need a corpus like this:


<START:personMale> Pierre Vinken <END> , 61 years old , will join the board
as a nonexecutive director Nov. 29 .
Mr . <START:personMale> Vinken <END> is chairman of Elsevier N.V. , the
Dutch publishing group . <START:personFemale> Jessie Robson <END> is
retiring , she was a board member for 5 years .


I am not an English native speaker, so I am not sure if the example is
clear enough. I tried to use Jessie as a neutral name and "she" as
disambiguation.

With a corpus big enough maybe you could create a model that outputs both
classes, personMale and personFemale. To train a model you can follow
https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training

Let's say your results are not good enough. You could increment your model
using Custom Feature Generators (
https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training.featuregen
and
https://opennlp.apache.org/documentation/1.6.0/apidocs/opennlp-tools/opennlp/tools/util/featuregen/package-summary.html
).

One of the implemented featuregen can take a dictionary (
https://opennlp.apache.org/documentation/1.6.0/apidocs/opennlp-tools/opennlp/tools/util/featuregen/DictionaryFeatureGenerator.html
).
You can also implement other convenient FeatureGenerator, for instance
regex.

Again, it is just a wild guess of how to implement it. I don't know if it
would perform well. I was only thinking how to implement a gender ML model
that uses the surrounding context.

Hope I could clarify.

William

2016-06-28 19:15 GMT-03:00 Damiano Porta <damianopo...@gmail.com>:

> Hi William,
> Ok, so you are talking about a kind of pipe where we execute:
>
> 1. NER (personM for example)
> 2. Regex (filter to reduce false positives)
> 3. Plain dictionary (filter as above) ?
>
> Yes we can split out model in two for M and F, it is not a big problem, we
> have a database grouped by gender.
>
> I only have a doubt regarding the use of a dictionary. Because if we use a
> dictionary to create the model, we could only use it to detect names
> without using NER. No?
>
>
>
> 2016-06-29 0:10 GMT+02:00 William Colen <william.co...@gmail.com>:
>
> > Do you plan to use the surrounding context? If yes, maybe you could try
> to
> > split NER in two categories: PersonM and PersonF. Just an idea, never
> read
> > or tried anything like it. You would need a training corpus with these
> > classes.
> >
> > You could add both the plain dictionary and the regex as NER features as
> > well and check how it improves.
> >
> > 2016-06-28 18:56 GMT-03:00 Damiano Porta <damianopo...@gmail.com>:
> >
> > > Hello everybody,
> > >
> > > we built a NER model to find persons (name) inside our documents.
> > > We are looking for the best approach to understand if the name is
> > > male/female.
> > >
> > > Possible solutions:
> > > - Plain dictionary?
> > > - Regex to check the initial and/letters of the name?
> > > - Classifier? (naive bayes? Maxent?)
> > >
> > > Thanks
> > >
> >
>

Re: Model to detect the gender

Reply via email to