[jira] [Commented] (UIMA-3530) UIMA Rute - allow WORDLIST and WORDTABLE files to include not just plain text to be matched but also regular expressions

Martin Toepfer (JIRA) Wed, 08 Jan 2014 07:24:42 -0800

    [ 
https://issues.apache.org/jira/browse/UIMA-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13865543#comment-13865543
 ]


Martin Toepfer commented on UIMA-3530:
--------------------------------------

I've also been thinking about such a feature (for dealing with German 
inflection) -- maybe we can find a quick fix.

I've had a look at the source code and agree with Peter that a solution for 
full-featured regular expressions is not that simple. Nevertheless, how would 
you think about something like a template mechanism? For example:

A dictionary with the entry "kalt$$" could be called from within Ruta like

  Document {->MARKFAST(ADJ, adjList, ..., "$$"=>("","e","er","es","en"))};

which should add "kalt", "kalte", "kalter", "kaltes", "kalten" to the trie.

Would that be applicable to your Greek or Russian dictionaries?

(A collegue of mine once used this for modeling adjectives in German 
terminologies).

In the end, maybe one should instead think about using stemming or 
lemmatization (if possible). Or you could wrap the wordlist creation with your 
own code.

-- Martin

> UIMA Rute - allow WORDLIST and WORDTABLE files to include not just plain text 
> to be matched but also regular expressions 
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: UIMA-3530
>                 URL: https://issues.apache.org/jira/browse/UIMA-3530
>             Project: UIMA
>          Issue Type: Wish
>          Components: ruta
>            Reporter: Dimitris Vassos
>            Priority: Minor
>
> It would greatly speed up and simplify the implementation of dictionary 
> lookups using WORDLIST and WORDTABLE, if instead of just plain text entries 
> in the file we could enter regular expressions.
> Especially for inflectional languages such as Greek or Russian, this feature 
> is invaluable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (UIMA-3530) UIMA Rute - allow WORDLIST and WORDTABLE files to include not just plain text to be matched but also regular expressions

Reply via email to