[
https://issues.apache.org/jira/browse/CTAKES-371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14639278#comment-14639278
]
ASF subversion and git services commented on CTAKES-371:
--------------------------------------------------------
Commit 1692423 from [~seanfinan] in branch 'ctakes/trunk'
[ https://svn.apache.org/r1692423 ]
CTAKES-371 Automatic pre-tokenization of custom dictionary is closer to ctakes
ptb
> update PTB tokenization logic in fast dictionary module
> -------------------------------------------------------
>
> Key: CTAKES-371
> URL: https://issues.apache.org/jira/browse/CTAKES-371
> Project: cTAKES
> Issue Type: Bug
> Components: ctakes-dictionary-lookup
> Affects Versions: 3.2.2
> Reporter: britt fitch
> Assignee: Sean Finan
> Fix For: 3.2.3
>
>
> PTB tokenization logic is used in places like the tokenizer & dictionary
> building code.
> For example, given “22q11.2 deletion syndrome”:
> PTB tokenizer: [22q11, .2, deletion, syndrome]
> Dictionary module: [22q11, ., 2, deletion, syndrome]
> (RareWordTermMapCreator.getTokens)
> Dictionary module should be updated to match PTB tokenization logic used
> elsewhere in ctakes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)