[ 
https://issues.apache.org/jira/browse/UIMA-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369827#comment-16369827
 ] 

Peter Klügl commented on UIMA-5723:
-----------------------------------

I actually answered a question about the same problem to someone else off list 
last week. Take a look at the filtering setting or the filtered chars in the 
MARKTABLE action. The match/lookup is more flexible than the lookup for the 
feature. The match need to represent the same string as mentioned in the row 
for the feature. Actually the row that matched during the dictionary lookup.

 

There are several problems and flaws witht he Ruta wordlists and wordtables 
which cause problems all the time, also because they are more powerful than 
similar dictionary lookups. In order to avoid that, I wrote some simple 
dictionary lookup code which fixes exactly those flaws but it is not compatible 
with the ruta code, and is much simplier and more maintainable. Now, I do not 
use the ruta functionality at all, I see it as deprectated actually, but only 
my simple dictionary. I will contribute the code when I find the time, but I 
also need to find a good design how to include it in Ruta.

> MARKTABLE fails to assign feature for single word entry in first CSV column
> ---------------------------------------------------------------------------
>
>                 Key: UIMA-5723
>                 URL: https://issues.apache.org/jira/browse/UIMA-5723
>             Project: UIMA
>          Issue Type: Bug
>          Components: Ruta
>    Affects Versions: 2.6.1ruta
>            Reporter: Andreas Thiel
>            Assignee: Peter Klügl
>            Priority: Major
>
> When using Ruta's MARKTABLE action with a CSV file {{nl_law_names.csv}} like 
> this
> {code:xml}
> WAZ;WAZELF
> Wet arbeidsongeschiktheidsverzekering zelfstandigen;WAZELF
> {code}
> and corresponding Ruta script containing these lines
> {code:java}
> WORDTABLE LawNameTable = 'nl_law_names.csv';
> Document{->MARKTABLE(WetNaam, 1, LawNameTable, "WetIdentifier" = 2)};
> {code}
> it seems that the text {{WAZ}} is detected, but the {{WetIdentifier}} feature 
> of the resulting annotation is not filled by the string following the 
> semicolon. Instead, it remains empty.
> (Note: _WetNaam_ annotation is defined elsewhere via type system description)
> In contrast, the fully written name {{Wet arbeidsongeschiktheidsverzekering 
> zelfstandigen}} is detected and processed as expected with feature 
> WetIdentifier = WAZELF after annnotating.
> Could it be that problems arise when only a single word (i.e. no spaces or 
> uppercase letters following lowercase chars) is present in the first column 
> in the CSV file? Or is it a matter of configuration?
> We experimented also with the optional arguments of MARKTABLE regarding 
> uppercase/lowercase distinction, but to no avail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to