Andreas Thiel created UIMA-5723:
-----------------------------------
Summary: MARKTABLE fails to assign feature for single word entry
in first CSV column
Key: UIMA-5723
URL: https://issues.apache.org/jira/browse/UIMA-5723
Project: UIMA
Issue Type: Bug
Components: Ruta
Affects Versions: 2.6.1ruta
Reporter: Andreas Thiel
When using Ruta's MARKTABLE action with a CSV file {{nl_law_names.csv}} like
this
{code:xml}
WAZ;WAZELF
Wet arbeidsongeschiktheidsverzekering zelfstandigen;WAZELF
{code}
and corresponding Ruta script containing these lines
{code:java}
WORDTABLE LawNameTable = 'nl_law_names.csv';
Document{->MARKTABLE(WetNaam, 1, LawNameTable, "WetIdentifier" = 2)};
{code}
it seems that the text {{WAZ}} is detected, but the {{WetIdentifier}} feature
of the resulting annotation is not filled by the string following the
semicolon. Instead, it remains empty.
(Note: _WetNaam_ annotation is defined elsewhere via type system description)
In contrast, the fully written name {{Wet arbeidsongeschiktheidsverzekering
zelfstandigen}} is detected and processed as expected with feature
WetIdentifier = WAZELF after annnotating.
Could it be that problems arise when only a single word (i.e. no spaces or
uppercase letters following lowercase chars) is present in the first column in
the CSV file? Or is it a matter of configuration?
We experimented also with the optional arguments of MARKTABLE regarding
uppercase/lowercase distinction, but to no avail.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)