Re: Model to detect the gender

Damiano Porta Wed, 29 Jun 2016 06:27:58 -0700

Awesome! Thank you so much WIlliam!

2016-06-29 13:36 GMT+02:00 William Colen <william.co...@gmail.com>:


> To create a NER model OpenNLP extracts features from the context, things
> such as: word prefix and suffix, next word, previous word, previous word
> prefix and suffix, next word prefix and suffix etc.
> When you don't configure the feature generator it will apply the default:
>
> https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training.featuregen.api
>
> Default feature generator:
>
> AdaptiveFeatureGenerator featureGenerator = *new* CachedFeatureGenerator(
>          *new* AdaptiveFeatureGenerator[]{
>            *new* WindowFeatureGenerator(*new* TokenFeatureGenerator(), 2,
> 2),
>            *new* WindowFeatureGenerator(*new*
> TokenClassFeatureGenerator(true), 2, 2),
>            *new* OutcomePriorFeatureGenerator(),
>            *new* PreviousMapFeatureGenerator(),
>            *new* BigramNameFeatureGenerator(),
>            *new* SentenceFeatureGenerator(true, false)
>            });
>
>
> These default features should work for most cases (specially English), but
> they of course can be incremented. If you do so, your model will take new
> features in account. So yes, you are putting the features in your model.
>
> To configure custom features is not easy. I would start with the default
> and use 10-fold cross-validation and take notes of its effectiveness. Than
> change/add a feature, evaluate and take notes. Sometimes a feature that we
> are sure would help can destroy the model effectiveness.
>
> Regards
> William
>
>
> 2016-06-29 7:00 GMT-03:00 Damiano Porta <damianopo...@gmail.com>:
>
> > Thank you William! Really appreciated!
> >
> > I only do not get one point, when you said "You could increment your
> > model using
> > Custom Feature Generators" does it mean that i can "put" these features
> > inside ONE *.bin* file (model) that implement different things, or, name
> > finder is one thing and those feature generators other?
> >
> > Thank you in advance for the clarification.
> >
> > 2016-06-29 1:23 GMT+02:00 William Colen <william.co...@gmail.com>:
> >
> > > Not exactly. You would create a new NER model to replace yours.
> > >
> > > In this approach you would need a corpus like this:
> > >
> > > <START:personMale> Pierre Vinken <END> , 61 years old , will join the
> > board
> > > as a nonexecutive director Nov. 29 .
> > > Mr . <START:personMale> Vinken <END> is chairman of Elsevier N.V. , the
> > > Dutch publishing group . <START:personFemale> Jessie Robson <END> is
> > > retiring , she was a board member for 5 years .
> > >
> > >
> > > I am not an English native speaker, so I am not sure if the example is
> > > clear enough. I tried to use Jessie as a neutral name and "she" as
> > > disambiguation.
> > >
> > > With a corpus big enough maybe you could create a model that outputs
> both
> > > classes, personMale and personFemale. To train a model you can follow
> > >
> > >
> >
> https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training
> > >
> > > Let's say your results are not good enough. You could increment your
> > model
> > > using Custom Feature Generators (
> > >
> > >
> >
> https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training.featuregen
> > > and
> > >
> > >
> >
> https://opennlp.apache.org/documentation/1.6.0/apidocs/opennlp-tools/opennlp/tools/util/featuregen/package-summary.html
> > > ).
> > >
> > > One of the implemented featuregen can take a dictionary (
> > >
> > >
> >
> https://opennlp.apache.org/documentation/1.6.0/apidocs/opennlp-tools/opennlp/tools/util/featuregen/DictionaryFeatureGenerator.html
> > > ).
> > > You can also implement other convenient FeatureGenerator, for instance
> > > regex.
> > >
> > > Again, it is just a wild guess of how to implement it. I don't know if
> it
> > > would perform well. I was only thinking how to implement a gender ML
> > model
> > > that uses the surrounding context.
> > >
> > > Hope I could clarify.
> > >
> > > William
> > >
> > > 2016-06-28 19:15 GMT-03:00 Damiano Porta <damianopo...@gmail.com>:
> > >
> > > > Hi William,
> > > > Ok, so you are talking about a kind of pipe where we execute:
> > > >
> > > > 1. NER (personM for example)
> > > > 2. Regex (filter to reduce false positives)
> > > > 3. Plain dictionary (filter as above) ?
> > > >
> > > > Yes we can split out model in two for M and F, it is not a big
> problem,
> > > we
> > > > have a database grouped by gender.
> > > >
> > > > I only have a doubt regarding the use of a dictionary. Because if we
> > use
> > > a
> > > > dictionary to create the model, we could only use it to detect names
> > > > without using NER. No?
> > > >
> > > >
> > > >
> > > > 2016-06-29 0:10 GMT+02:00 William Colen <william.co...@gmail.com>:
> > > >
> > > > > Do you plan to use the surrounding context? If yes, maybe you could
> > try
> > > > to
> > > > > split NER in two categories: PersonM and PersonF. Just an idea,
> never
> > > > read
> > > > > or tried anything like it. You would need a training corpus with
> > these
> > > > > classes.
> > > > >
> > > > > You could add both the plain dictionary and the regex as NER
> features
> > > as
> > > > > well and check how it improves.
> > > > >
> > > > > 2016-06-28 18:56 GMT-03:00 Damiano Porta <damianopo...@gmail.com>:
> > > > >
> > > > > > Hello everybody,
> > > > > >
> > > > > > we built a NER model to find persons (name) inside our documents.
> > > > > > We are looking for the best approach to understand if the name is
> > > > > > male/female.
> > > > > >
> > > > > > Possible solutions:
> > > > > > - Plain dictionary?
> > > > > > - Regex to check the initial and/letters of the name?
> > > > > > - Classifier? (naive bayes? Maxent?)
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Model to detect the gender

Reply via email to