[
https://issues.apache.org/jira/browse/CTAKES-158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Finan updated CTAKES-158:
------------------------------
Priority: Minor (was: Major)
> DateAnnotation bug when two dates directly adjacent
> ---------------------------------------------------
>
> Key: CTAKES-158
> URL: https://issues.apache.org/jira/browse/CTAKES-158
> Project: cTAKES
> Issue Type: Bug
> Components: ctakes-context-tokenizer
> Affects Versions: 3.0-incubating, 3.1.0
> Reporter: James Joseph Masanz
> Priority: Minor
>
> from email from Shady AbdelAziz February 11, 2013 on ctakes-dev@
> While working with DateAnnotation and add some new state machines in the
> DateFSM.java, i found a minor bug regarding the starting and ending index of
> DateAnnotation.
> Consider the small example
> "October 2003 November 2010 cTAKES is the best framework".
> The result is supposed to be "October 2003" and "November 2010", but cTAKES
> detects "October 2003" and "October 2003 November 2010".
> This is because the FSM detects the first one and as it has no record in the
> "tokenStartMap" so it assumes the starting index as "0". Then it starts
> detecting the second date but also there is no record for it in the map
> yet(as there is a value in the map only when the state is a starting state,
> in other words a condition that is not satisfying any state), so it assumes
> the starting index is "0".
> Thats why for example if there is an intermediate token between the two
> dates, it will work fine.
> The solution is simply to put a record in the map before resetting the FSM.
> so this line should be put "tokenStartMap.put(fsm, new Integer(i));".
--
This message was sent by Atlassian Jira
(v8.20.10#820010)