[
https://issues.apache.org/jira/browse/OPENNLP-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247053#comment-17247053
]
Alan Wang edited comment on OPENNLP-1312 at 12/10/20, 7:59 AM:
---------------------------------------------------------------
Hello,
I noticed that there is no space between entity and parenthesis in the
_test.txt_, which cause tokenize to “(1” and “HE)”, so the problem is the
tokenizer. You can fix it by adding spaces around the entity or train a model
that can separate the parentheses.
this a simple test.
{code:java}
String filePath = "opennlp/tools/namefind/test_model.bin";
InputStream inputStream =
getClass().getClassLoader().getResourceAsStream(filePath);
TokenNameFinderModel nameFinderModel = new
TokenNameFinderModel(inputStream);
NameFinderME nameFinderME = new NameFinderME(nameFinderModel);
String text = "One glass slide labeled \" 국민건강보험공단 일산병원 SS05-6251 \"
was submitted for consultation.\n" +
" << Date of biopsy: 2005. 7. 5 >>\n" +
"MICRO ( 1 HE )\n" +
"DIAGNOSIS:\n" +
" Breast, right, needle biopsy:\n" +
" Usual ductal hyperplasia, focal";
String[] sentence = WhitespaceTokenizer.INSTANCE.tokenize(text);
Span[] names = nameFinderME.find(sentence);
for (Span name : names) {
String entity = "";
for(int i=name.getStart(); i<name.getEnd(); i++){
entity += sentence[i]+" ";
}
System.out.println(" > Result: Type: ["+name.getType()+"] : Name:
["+entity+"]\t [probability="+name.getProb()+"]");
}
NameSample nameSample = new NameSample(sentence, names, true);
System.out.println(nameSample);
{code}
OUTPUT:
{quote}> Result: Type: [micro] : Name: [1 HE ] [probability=0.684028262677087]
One glass slide labeled " 국민건강보험공단 일산병원 SS05-6251 " was submitted for
consultation. << Date of biopsy: 2005. 7. 5 >> MICRO ( <START:micro> 1 HE <END>
) DIAGNOSIS: Breast, right, needle biopsy: Usual ductal hyperplasia, focal
{quote}
was (Author: alan wang):
Hello,
I noticed that there is no space between entity and parenthesis in the
_test.txt_, which cause tokenize to “(1” and “HE)”. So add spaces around the
entity to fix.
this a simple test.
{code:java}
String filePath = "opennlp/tools/namefind/test_model.bin";
InputStream inputStream =
getClass().getClassLoader().getResourceAsStream(filePath);
TokenNameFinderModel nameFinderModel = new
TokenNameFinderModel(inputStream);
NameFinderME nameFinderME = new NameFinderME(nameFinderModel);
String text = "One glass slide labeled \" 국민건강보험공단 일산병원 SS05-6251 \"
was submitted for consultation.\n" +
" << Date of biopsy: 2005. 7. 5 >>\n" +
"MICRO ( 1 HE )\n" +
"DIAGNOSIS:\n" +
" Breast, right, needle biopsy:\n" +
" Usual ductal hyperplasia, focal";
String[] sentence = WhitespaceTokenizer.INSTANCE.tokenize(text);
Span[] names = nameFinderME.find(sentence);
for (Span name : names) {
String entity = "";
for(int i=name.getStart(); i<name.getEnd(); i++){
entity += sentence[i]+" ";
}
System.out.println(" > Result: Type: ["+name.getType()+"] : Name:
["+entity+"]\t [probability="+name.getProb()+"]");
}
NameSample nameSample = new NameSample(sentence, names, true);
System.out.println(nameSample);
{code}
OUTPUT:
{quote}> Result: Type: [micro] : Name: [1 HE ] [probability=0.684028262677087]
One glass slide labeled " 국민건강보험공단 일산병원 SS05-6251 " was submitted for
consultation. << Date of biopsy: 2005. 7. 5 >> MICRO ( <START:micro> 1 HE <END>
) DIAGNOSIS: Breast, right, needle biopsy: Usual ductal hyperplasia, focal
{quote}
> Custom TokenNameFinder result is not valid
> ------------------------------------------
>
> Key: OPENNLP-1312
> URL: https://issues.apache.org/jira/browse/OPENNLP-1312
> Project: OpenNLP
> Issue Type: Question
> Components: Name Finder
> Affects Versions: 1.9.3
> Reporter: Yoonhee Park
> Priority: Major
> Attachments: eval.data, test.txt, test_model.bin
>
>
> If I run TokenNameFinderEvaluator after creating a custom model, evaluation
> F1 is 100%.
> but if I test(TokenNameFinder) a document used for training and evaluation,
> the entity is not tagged.
>
>
> % opennlp.bat TokenNameFinderEvaluator -model test_model.bin -data eval.data
> Evaluated 5 samples with 1 entities; found: 1 entities; correct: 1.
> TOTAL: precision: 100.00%; recall: 100.00%; F1: 100.00%.
> micro: precision: 100.00%; recall: 100.00%; F1: 100.00%. [target: 1; tp: 1;
> fp: 0]%
>
>
> % opennlp.bat TokenNameFinder test_model.bin < test.txt
> >>> Not Tagged Result
> How do I fix this issue?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)