[
https://issues.apache.org/jira/browse/UIMA-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866666#comment-13866666
]
Pepi Stavropoulou commented on UIMA-3530:
-----------------------------------------
Many thanks for the workaround suggestion.
If I understand this correctly, it is not exactly what we are looking for, as
we need to map this "$$" placeholder to different endings depending on the
lemma type/ word form. So we would need different placeholders to be mapped to
different ending sets.
E.g.
"kalt$TemplateA$" where "$TemplateA$" =>("","e","er","es","en"))
"spiel$TemplateB$" where "$TemplateB$" =>("e","st","st","en"))
Would it be possible as a temp solution to use reg expressions in the
dictionary, expand them into separate entries as a preprocessing step, and then
continue with building the trie as usual?
For example, regex kalt(e|er|es|en)? would be expanded to different entries
kalt, kalte, kalter etc sharing the same features.
I understand it can be time and memory consuming, but they would be simple
regexs possibly with no *, + operators allowed.
> UIMA Rute - allow WORDLIST and WORDTABLE files to include not just plain text
> to be matched but also regular expressions
> -------------------------------------------------------------------------------------------------------------------------
>
> Key: UIMA-3530
> URL: https://issues.apache.org/jira/browse/UIMA-3530
> Project: UIMA
> Issue Type: Wish
> Components: ruta
> Reporter: Dimitris Vassos
> Priority: Minor
>
> It would greatly speed up and simplify the implementation of dictionary
> lookups using WORDLIST and WORDTABLE, if instead of just plain text entries
> in the file we could enter regular expressions.
> Especially for inflectional languages such as Greek or Russian, this feature
> is invaluable.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)