Hi Mondher,
could you give me a raw example to understand how i should train the
classifier model?

Thank you in advance!
Damiano


2016-06-30 6:57 GMT+02:00 Mondher Bouazizi <[email protected]>:

> Hi,
>
> I would recommend a hybrid approach where, in a first step, you use a plain
> dictionary and then perform the classification if needed.
>
> It's straightforward, but I think it would present better performances than
> just performing a classification task.
>
> In the first step you use a dictionary of names along with an attribute
> specifying whether the name fits for males, females or both. In case the
> name fits for males or females exclusively, then no need to go any further.
>
> If the name fits for both genders, or is a family name etc., a second step
> is needed where you extract features from the context (surrounding words,
> etc.) and perform a classification task using any machine learning
> algorithm.
>
> Another way would be using the information itself (whether the name fits
> for males, females or both) as a feature when you perform the
> classification.
>
> Best regards,
>
> Mondher
>
> I am not sure
>
> On Wed, Jun 29, 2016 at 10:27 PM, Damiano Porta <[email protected]>
> wrote:
>
> > Awesome! Thank you so much WIlliam!
> >
> > 2016-06-29 13:36 GMT+02:00 William Colen <[email protected]>:
> >
> > > To create a NER model OpenNLP extracts features from the context,
> things
> > > such as: word prefix and suffix, next word, previous word, previous
> word
> > > prefix and suffix, next word prefix and suffix etc.
> > > When you don't configure the feature generator it will apply the
> default:
> > >
> > >
> >
> https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training.featuregen.api
> > >
> > > Default feature generator:
> > >
> > > AdaptiveFeatureGenerator featureGenerator = *new*
> CachedFeatureGenerator(
> > >          *new* AdaptiveFeatureGenerator[]{
> > >            *new* WindowFeatureGenerator(*new* TokenFeatureGenerator(),
> 2,
> > > 2),
> > >            *new* WindowFeatureGenerator(*new*
> > > TokenClassFeatureGenerator(true), 2, 2),
> > >            *new* OutcomePriorFeatureGenerator(),
> > >            *new* PreviousMapFeatureGenerator(),
> > >            *new* BigramNameFeatureGenerator(),
> > >            *new* SentenceFeatureGenerator(true, false)
> > >            });
> > >
> > >
> > > These default features should work for most cases (specially English),
> > but
> > > they of course can be incremented. If you do so, your model will take
> new
> > > features in account. So yes, you are putting the features in your
> model.
> > >
> > > To configure custom features is not easy. I would start with the
> default
> > > and use 10-fold cross-validation and take notes of its effectiveness.
> > Than
> > > change/add a feature, evaluate and take notes. Sometimes a feature that
> > we
> > > are sure would help can destroy the model effectiveness.
> > >
> > > Regards
> > > William
> > >
> > >
> > > 2016-06-29 7:00 GMT-03:00 Damiano Porta <[email protected]>:
> > >
> > > > Thank you William! Really appreciated!
> > > >
> > > > I only do not get one point, when you said "You could increment your
> > > > model using
> > > > Custom Feature Generators" does it mean that i can "put" these
> features
> > > > inside ONE *.bin* file (model) that implement different things, or,
> > name
> > > > finder is one thing and those feature generators other?
> > > >
> > > > Thank you in advance for the clarification.
> > > >
> > > > 2016-06-29 1:23 GMT+02:00 William Colen <[email protected]>:
> > > >
> > > > > Not exactly. You would create a new NER model to replace yours.
> > > > >
> > > > > In this approach you would need a corpus like this:
> > > > >
> > > > > <START:personMale> Pierre Vinken <END> , 61 years old , will join
> the
> > > > board
> > > > > as a nonexecutive director Nov. 29 .
> > > > > Mr . <START:personMale> Vinken <END> is chairman of Elsevier N.V. ,
> > the
> > > > > Dutch publishing group . <START:personFemale> Jessie Robson <END>
> is
> > > > > retiring , she was a board member for 5 years .
> > > > >
> > > > >
> > > > > I am not an English native speaker, so I am not sure if the example
> > is
> > > > > clear enough. I tried to use Jessie as a neutral name and "she" as
> > > > > disambiguation.
> > > > >
> > > > > With a corpus big enough maybe you could create a model that
> outputs
> > > both
> > > > > classes, personMale and personFemale. To train a model you can
> follow
> > > > >
> > > > >
> > > >
> > >
> >
> https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training
> > > > >
> > > > > Let's say your results are not good enough. You could increment
> your
> > > > model
> > > > > using Custom Feature Generators (
> > > > >
> > > > >
> > > >
> > >
> >
> https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training.featuregen
> > > > > and
> > > > >
> > > > >
> > > >
> > >
> >
> https://opennlp.apache.org/documentation/1.6.0/apidocs/opennlp-tools/opennlp/tools/util/featuregen/package-summary.html
> > > > > ).
> > > > >
> > > > > One of the implemented featuregen can take a dictionary (
> > > > >
> > > > >
> > > >
> > >
> >
> https://opennlp.apache.org/documentation/1.6.0/apidocs/opennlp-tools/opennlp/tools/util/featuregen/DictionaryFeatureGenerator.html
> > > > > ).
> > > > > You can also implement other convenient FeatureGenerator, for
> > instance
> > > > > regex.
> > > > >
> > > > > Again, it is just a wild guess of how to implement it. I don't know
> > if
> > > it
> > > > > would perform well. I was only thinking how to implement a gender
> ML
> > > > model
> > > > > that uses the surrounding context.
> > > > >
> > > > > Hope I could clarify.
> > > > >
> > > > > William
> > > > >
> > > > > 2016-06-28 19:15 GMT-03:00 Damiano Porta <[email protected]>:
> > > > >
> > > > > > Hi William,
> > > > > > Ok, so you are talking about a kind of pipe where we execute:
> > > > > >
> > > > > > 1. NER (personM for example)
> > > > > > 2. Regex (filter to reduce false positives)
> > > > > > 3. Plain dictionary (filter as above) ?
> > > > > >
> > > > > > Yes we can split out model in two for M and F, it is not a big
> > > problem,
> > > > > we
> > > > > > have a database grouped by gender.
> > > > > >
> > > > > > I only have a doubt regarding the use of a dictionary. Because if
> > we
> > > > use
> > > > > a
> > > > > > dictionary to create the model, we could only use it to detect
> > names
> > > > > > without using NER. No?
> > > > > >
> > > > > >
> > > > > >
> > > > > > 2016-06-29 0:10 GMT+02:00 William Colen <[email protected]
> >:
> > > > > >
> > > > > > > Do you plan to use the surrounding context? If yes, maybe you
> > could
> > > > try
> > > > > > to
> > > > > > > split NER in two categories: PersonM and PersonF. Just an idea,
> > > never
> > > > > > read
> > > > > > > or tried anything like it. You would need a training corpus
> with
> > > > these
> > > > > > > classes.
> > > > > > >
> > > > > > > You could add both the plain dictionary and the regex as NER
> > > features
> > > > > as
> > > > > > > well and check how it improves.
> > > > > > >
> > > > > > > 2016-06-28 18:56 GMT-03:00 Damiano Porta <
> [email protected]
> > >:
> > > > > > >
> > > > > > > > Hello everybody,
> > > > > > > >
> > > > > > > > we built a NER model to find persons (name) inside our
> > documents.
> > > > > > > > We are looking for the best approach to understand if the
> name
> > is
> > > > > > > > male/female.
> > > > > > > >
> > > > > > > > Possible solutions:
> > > > > > > > - Plain dictionary?
> > > > > > > > - Regex to check the initial and/letters of the name?
> > > > > > > > - Classifier? (naive bayes? Maxent?)
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to