Hi William, Ok, so you are talking about a kind of pipe where we execute: 1. NER (personM for example) 2. Regex (filter to reduce false positives) 3. Plain dictionary (filter as above) ?
Yes we can split out model in two for M and F, it is not a big problem, we have a database grouped by gender. I only have a doubt regarding the use of a dictionary. Because if we use a dictionary to create the model, we could only use it to detect names without using NER. No? 2016-06-29 0:10 GMT+02:00 William Colen <[email protected]>: > Do you plan to use the surrounding context? If yes, maybe you could try to > split NER in two categories: PersonM and PersonF. Just an idea, never read > or tried anything like it. You would need a training corpus with these > classes. > > You could add both the plain dictionary and the regex as NER features as > well and check how it improves. > > 2016-06-28 18:56 GMT-03:00 Damiano Porta <[email protected]>: > > > Hello everybody, > > > > we built a NER model to find persons (name) inside our documents. > > We are looking for the best approach to understand if the name is > > male/female. > > > > Possible solutions: > > - Plain dictionary? > > - Regex to check the initial and/letters of the name? > > - Classifier? (naive bayes? Maxent?) > > > > Thanks > > >
