UIMA Ruta next steps

Peter Klügl Thu, 19 Dec 2013 06:29:14 -0800

Hi,

I just want to start a discussion about the next release and maybe
interesting directions for extensions.

I am planning a bugfix release for the end of January, UIMA Ruta version
2.1.1

List of the 26 already resolved issues for 2.1.1:
https://issues.apache.org/jira/browse/UIMA-3342?jql=project%20%3D%20UIMA%20AND%20fixVersion%20%3D%20%222.1.1ruta%22%20AND%20component%20%3D%20ruta%20AND%20status%20in%20(Resolved%2C%20Closed)%20ORDER%20BY%20priority%20DESC

List of currently unresolved issues:
https://issues.apache.org/jira/browse/UIMA-2982?jql=project%20%3D%20UIMA%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20ruta%20ORDER%20BY%20priority%20DESC

I think the following issues should (at least) be resolved in addition
for 2.1.1 (some of them are already fixed, but the documentation is not
yet up-to-date):
- UIMA-3137: Cleanup Ruta launch configuration tabs
- UIMA-3471: Arrays in Annotation Browser View
- UIMA-3347: Ruta: Missing False Positives in "Annotation Test" view
- UIMA-3286: Start anchor after optional literal
- UIMA-3280: Option to specify vm arguments for Ruta launch config
- UIMA-3283: Matching reference pointing outside of current window
- UIMA-3303: Add a way to alias types in RUTA (e.g. "IMPORT type AS alias")
- UIMA-3495: Report ambiguous types in Ruta Editor
- UIMA-3441: Ruta: Extend classpath for Annotation Test run
- UIMA-3469: Ruta: Annotation Browser View Extensions
- UIMA-3275: Minor discrepencies in license and notice files
- UIMA-3309: Ruta: Filter file names in Query View
- UIMA-3485: Ruta: Workbench extension point for "Script execution finished"

Maybe the issues for dropins-support should also be included.

Are there any wishes/opinions which other issues should be included?

###

Here are a few ideas of major changes for a 2.2.x or 3.x release:

1. Making UIMA Ruta faster
There are four aspects that can be considered:
a) Parallelization/Scale-Out, already supported by UIMA-AS and friends
b) Improvements in the current implementation. I know of at least four
code fragments that can be improved
c) Add new language constructs that are simply faster in some
situations. I am thinking of an FST implementation similar to JAPE Plus
and of an extension of the dynamic anchoring towards the operator plan
optimization of SystemT
d) Write faster rules. Some rules are just faster than others. This
leads to a cookbook for best practices

2. Improve support for coreference information
There are some nice ideas of unification-based grammars that can be
included in the rule language. It does not have to be as mature as in
SProUT, but maybe something like in CAFETIERE. This would automatically
solve the restriction of value assignments in actions vs conditions

3. Support arbitrary CAS collections in the Ruta Workbench
The Workbench currently only supports normal xmi files. There is no
concept of a collection reader or similar stuff. It would maybe be nice
for some users, if the Workbench can operate on CASs stored in a
database or on any collection reader.

4. Actually useful rule induction algorithm
After about six implementations of supervised rule learners, I think I
have an idea of the layout of an actually useful algorithm for Ruta. I
think it's also the time to adapt some ideas of semi-supervised machine
learning for rule-based systems.

5. Support generic type systems in the Workbench
Sometimes you cannot avoid specifying the semantics of an annotation in
the feature values instead of in the type. However, most of the tooling
will be not as useful then, e.g., the Annotation Browser view shows only
one type with a lot of annotations. There should be some additional,
configurable views that support those situations.

All opinions or wishes are welcome :-)

Best,

Peter

UIMA Ruta next steps

Reply via email to