[jira] [Commented] (UIMA-3530) UIMA Rute - allow WORDLIST and WORDTABLE files to include not just plain text to be matched but also regular expressions

Pepi Stavropoulou (JIRA) Thu, 09 Jan 2014 06:19:52 -0800

    [ 
https://issues.apache.org/jira/browse/UIMA-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866666#comment-13866666
 ]


Pepi Stavropoulou commented on UIMA-3530:
-----------------------------------------

Many thanks for the workaround suggestion.
If I understand this correctly, it is not exactly what we are looking for, as 
we need to map this "$$" placeholder to different endings depending on the 
lemma type/ word form. So we would need different placeholders to be mapped to 
different ending sets.
E.g.
"kalt$TemplateA$" where "$TemplateA$" =>("","e","er","es","en"))
"spiel$TemplateB$"  where "$TemplateB$" =>("e","st","st","en"))

Would it be possible as a temp solution to use reg expressions in the 
dictionary, expand them into separate entries as a preprocessing step, and then 
continue with building the trie as usual?
For example, regex kalt(e|er|es|en)? would be expanded to different entries 
kalt, kalte, kalter etc sharing the same features.
I understand it can be time and memory consuming, but they would be simple 
regexs possibly with no *, + operators allowed.

> UIMA Rute - allow WORDLIST and WORDTABLE files to include not just plain text 
> to be matched but also regular expressions 
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: UIMA-3530
>                 URL: https://issues.apache.org/jira/browse/UIMA-3530
>             Project: UIMA
>          Issue Type: Wish
>          Components: ruta
>            Reporter: Dimitris Vassos
>            Priority: Minor
>
> It would greatly speed up and simplify the implementation of dictionary 
> lookups using WORDLIST and WORDTABLE, if instead of just plain text entries 
> in the file we could enter regular expressions.
> Especially for inflectional languages such as Greek or Russian, this feature 
> is invaluable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (UIMA-3530) UIMA Rute - allow WORDLIST and WORDTABLE files to include not just plain text to be matched but also regular expressions

Reply via email to