[jira] [Comment Edited] (OPENNLP-1312) Custom TokenNameFinder result is not valid

Alan Wang (Jira) Thu, 10 Dec 2020 00:00:34 -0800


    [ 
https://issues.apache.org/jira/browse/OPENNLP-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247053#comment-17247053
 ]


Alan Wang edited comment on OPENNLP-1312 at 12/10/20, 7:59 AM:
---------------------------------------------------------------

Hello, 

I noticed that there is no space between entity and parenthesis in the 
_test.txt_, which cause tokenize to “(1” and “HE)”, so the problem is the 
tokenizer. You can fix it by adding spaces around the entity or train a model 
that can separate the parentheses.

this a simple test.
{code:java}
        String filePath = "opennlp/tools/namefind/test_model.bin";
        InputStream inputStream = 
getClass().getClassLoader().getResourceAsStream(filePath);
        TokenNameFinderModel nameFinderModel = new 
TokenNameFinderModel(inputStream);
        NameFinderME nameFinderME = new NameFinderME(nameFinderModel);

        String text = "One glass slide labeled \" 국민건강보험공단 일산병원 SS05-6251 \" 
was submitted         for consultation.\n" +
                "  << Date of biopsy: 2005. 7. 5 >>\n" +
                "MICRO ( 1 HE )\n" +
                "DIAGNOSIS:\n" +
                " Breast, right, needle biopsy:\n" +
                "  Usual ductal hyperplasia, focal";

        String[] sentence = WhitespaceTokenizer.INSTANCE.tokenize(text);
        Span[] names = nameFinderME.find(sentence);

        for (Span name : names) {
            String entity = "";
            for(int i=name.getStart(); i<name.getEnd(); i++){
                entity += sentence[i]+" ";
            }
            System.out.println(" > Result: Type: ["+name.getType()+"] : Name: 
["+entity+"]\t [probability="+name.getProb()+"]");
        }
        NameSample nameSample = new NameSample(sentence, names, true);
        System.out.println(nameSample);
{code}
 

OUTPUT:
{quote}> Result: Type: [micro] : Name: [1 HE ] [probability=0.684028262677087]

One glass slide labeled " 국민건강보험공단 일산병원 SS05-6251 " was submitted for 
consultation. << Date of biopsy: 2005. 7. 5 >> MICRO ( <START:micro> 1 HE <END> 
) DIAGNOSIS: Breast, right, needle biopsy: Usual ductal hyperplasia, focal
{quote}


was (Author: alan wang):
Hello, 

I noticed that there is no space between entity and parenthesis in the 
_test.txt_, which cause tokenize to “(1” and “HE)”. So add spaces around the 
entity to fix.


 this a simple test.
{code:java}
        String filePath = "opennlp/tools/namefind/test_model.bin";
        InputStream inputStream = 
getClass().getClassLoader().getResourceAsStream(filePath);
        TokenNameFinderModel nameFinderModel = new 
TokenNameFinderModel(inputStream);
        NameFinderME nameFinderME = new NameFinderME(nameFinderModel);

        String text = "One glass slide labeled \" 국민건강보험공단 일산병원 SS05-6251 \" 
was submitted         for consultation.\n" +
                "  << Date of biopsy: 2005. 7. 5 >>\n" +
                "MICRO ( 1 HE )\n" +
                "DIAGNOSIS:\n" +
                " Breast, right, needle biopsy:\n" +
                "  Usual ductal hyperplasia, focal";

        String[] sentence = WhitespaceTokenizer.INSTANCE.tokenize(text);
        Span[] names = nameFinderME.find(sentence);

        for (Span name : names) {
            String entity = "";
            for(int i=name.getStart(); i<name.getEnd(); i++){
                entity += sentence[i]+" ";
            }
            System.out.println(" > Result: Type: ["+name.getType()+"] : Name: 
["+entity+"]\t [probability="+name.getProb()+"]");
        }
        NameSample nameSample = new NameSample(sentence, names, true);
        System.out.println(nameSample);
{code}
 

OUTPUT:
{quote}> Result: Type: [micro] : Name: [1 HE ] [probability=0.684028262677087]

One glass slide labeled " 국민건강보험공단 일산병원 SS05-6251 " was submitted for 
consultation. << Date of biopsy: 2005. 7. 5 >> MICRO ( <START:micro> 1 HE <END> 
) DIAGNOSIS: Breast, right, needle biopsy: Usual ductal hyperplasia, focal
{quote}

> Custom TokenNameFinder result is not valid
> ------------------------------------------
>
>                 Key: OPENNLP-1312
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1312
>             Project: OpenNLP
>          Issue Type: Question
>          Components: Name Finder
>    Affects Versions: 1.9.3
>            Reporter: Yoonhee Park
>            Priority: Major
>         Attachments: eval.data, test.txt, test_model.bin
>
>
> If I run TokenNameFinderEvaluator after creating a custom model,  evaluation 
> F1 is 100%.
> but if I test(TokenNameFinder) a document used for training and evaluation, 
> the entity is not tagged.
>  
>  
>  % opennlp.bat TokenNameFinderEvaluator -model test_model.bin -data eval.data
>  Evaluated 5 samples with 1 entities; found: 1 entities; correct: 1.
>  TOTAL: precision: 100.00%; recall: 100.00%; F1: 100.00%.
>  micro: precision: 100.00%; recall: 100.00%; F1: 100.00%. [target: 1; tp: 1; 
> fp: 0]% 
>   
>   
>  % opennlp.bat TokenNameFinder test_model.bin < test.txt
>  >>> Not Tagged Result
>  How do I fix this issue?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (OPENNLP-1312) Custom TokenNameFinder result is not valid

Reply via email to