[jira] [Commented] (CTAKES-227) Broca's -> PunctuationToken instead of ContractionToken - caused by apostrophe seen as sentence ending

ASF subversion and git services (JIRA) Mon, 26 Aug 2013 11:52:43 -0700

    [ 
https://issues.apache.org/jira/browse/CTAKES-227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13750399#comment-13750399
 ]


ASF subversion and git services commented on CTAKES-227:
--------------------------------------------------------

Commit 1517639 from [email protected] in branch 'ctakes/trunk'
[ https://svn.apache.org/r1517639 ]

CTAKES-227 - don't have apostrophe in list of potential sentence endings due to 
too many false positives
                
> Broca's -> PunctuationToken instead of ContractionToken - caused by 
> apostrophe seen as sentence ending
> ------------------------------------------------------------------------------------------------------
>
>                 Key: CTAKES-227
>                 URL: https://issues.apache.org/jira/browse/CTAKES-227
>             Project: cTAKES
>          Issue Type: Bug
>          Components: ctakes-core
>    Affects Versions: 3.1
>            Reporter: James Joseph Masanz
>            Assignee: James Joseph Masanz
>
> The recently rebuilt sentence detector (currently in trunk and the 3.1.0 
> branch) is sometimes taking the apostrophe as a sentence break where the 
> ctakes-3.0.0-incubating model didn’t.
> The training data used for the recently rebuilt model only contains only 7 
> lines that end with an apostrophe (single quote) followed immediately by a 
> newline
> It has >100K occurrences of 's
> It has >175K occurrences of the ' character in all.
> The place I noticed this is in testfakenote.txt.xml in ctakes-regression-test.
> The word "Broca's" used to have a ContractionToken but since a sentence is 
> now ending on the apostrophe, the apostrophe is getting annotated as a 
> PunctuationToken.
> See more in the thread started at
> http://markmail.org/message/wavipejszlspzo5u
> including examples that split correctly and incorrectly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CTAKES-227) Broca's -> PunctuationToken instead of ContractionToken - caused by apostrophe seen as sentence ending

Reply via email to