Hi Daniel!

Thank you so much for your opinion.
It makes perfectly sense. But i am still a bit confused about the length of
the sentences.
In a resume there are many names, dates etc etc. So my doubt is regarding
the structure of the sentences because they follow specific patterns
sometimes.

For example i need to extract the personal name, (Who wrote the resume) the
Birthday etc etc.

As You know there are many names and dates inside a resume so i thought
about to write the entire resume as sentence to also train the "position"
less or more of the entities. If i "decompose" all the resume into
sentences i will lose this information. No?

Damiano

Il 25/Ago/2016 16:26, "Russ, Daniel (NIH/CIT) [E]" <[email protected]> ha
scritto:

> Hi Damiano,
>
>      Everyone can feel feel to correct my ignorance but I view the the
> name finder as follows.
>
>      I look at it as walking down the sentence and classifying words as
> “NOT IN NAME”  until I hit the start of a name than it is “START NAME”,
> Followed by “STILL IN NAME” until “NOT IN NAME”.  Take the sentence “Did
> John eat the stew”.  Starting with the first word in the sentence decide
> what are the odds that the first word starts a name (given that it is the
> first word happens to be “Did” in a sentence, with a capital but not all
> caps) starts a person’s name.  Then go to then next word in the sentence.
> If the first word was not in a name, what are the odds that the second word
> starts a name (given that the previous word did not start a name, the word
> starts with a capital (but not all capital), the word is John, and the
> previous word is “Did”).  If it decides that we are starting a name at
> “John”, we are now looking for the end.  What are the odds that “eat” is
> part of the name given that [“Did”: was not part of the name, was
> capitalized] and that [“John”: was the first word in the name, was
> capitalized].   You are essentially classifying [Did <- OTHER] [John
> <-START] [eat<-OTHER] [the<-OTHER] [stew<-OTHER].  If it was “Did John
> Smith eat the stew”.  You would have [Did <- OTHER] [John
> <-START][Smith<-IN] [eat<-OTHER] [the<-OTHER] [stew<-OTHER].  There are
> other features other than just word, previous word, and the shape (first
> letter capitalized, all letters capitalized).  I think the name finder uses
> part of speech also.
>
>
>     So you see that it is not a name lookup table, but dependent on the
> previous classification of words earlier in the sentence.  Therefore, you
> must have sentences. Does that help?
> Daniel
>
>
> Daniel Russ, Ph.D.
> Staff Scientist, Office of Intramural Research
> Center for Information Technology
> National Institutes of Health
> U.S. Department of Health and Human Services
> 12 South Drive
> Bethesda,  MD 20892-5624
>
> On Aug 25, 2016, at 9:55 AM, Damiano Porta <[email protected]<mailto:
> [email protected]>> wrote:
>
> Hello everybody!
>
> Could someone explain why should I separate each sentence of my documents
> to train my models?
> My documents are like resume/cv and the sentences can be very different.
> For example a sentence could also be :
>
> 1. Name: John
> 2. Surname: travolta
>
> Etc etc
> So my question is. What is the problem if i train ny models
> (namefinder,tokenizer) with the complete resume/cv one per line?
>
> Could It be a problem?
> In this case when i will like to tokenize the resume and doing the NER i
> will simply pass the complete resume text skiping the "sentences detection"
> process.
>
> Thanks for your opinion in advance!
>
> Best
> Damiano
>
>

Reply via email to