Hi, I just want to start a discussion about the next release and maybe interesting directions for extensions.
I am planning a bugfix release for the end of January, UIMA Ruta version 2.1.1 List of the 26 already resolved issues for 2.1.1: https://issues.apache.org/jira/browse/UIMA-3342?jql=project%20%3D%20UIMA%20AND%20fixVersion%20%3D%20%222.1.1ruta%22%20AND%20component%20%3D%20ruta%20AND%20status%20in%20(Resolved%2C%20Closed)%20ORDER%20BY%20priority%20DESC List of currently unresolved issues: https://issues.apache.org/jira/browse/UIMA-2982?jql=project%20%3D%20UIMA%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20ruta%20ORDER%20BY%20priority%20DESC I think the following issues should (at least) be resolved in addition for 2.1.1 (some of them are already fixed, but the documentation is not yet up-to-date): - UIMA-3137: Cleanup Ruta launch configuration tabs - UIMA-3471: Arrays in Annotation Browser View - UIMA-3347: Ruta: Missing False Positives in "Annotation Test" view - UIMA-3286: Start anchor after optional literal - UIMA-3280: Option to specify vm arguments for Ruta launch config - UIMA-3283: Matching reference pointing outside of current window - UIMA-3303: Add a way to alias types in RUTA (e.g. "IMPORT type AS alias") - UIMA-3495: Report ambiguous types in Ruta Editor - UIMA-3441: Ruta: Extend classpath for Annotation Test run - UIMA-3469: Ruta: Annotation Browser View Extensions - UIMA-3275: Minor discrepencies in license and notice files - UIMA-3309: Ruta: Filter file names in Query View - UIMA-3485: Ruta: Workbench extension point for "Script execution finished" Maybe the issues for dropins-support should also be included. Are there any wishes/opinions which other issues should be included? ### Here are a few ideas of major changes for a 2.2.x or 3.x release: 1. Making UIMA Ruta faster There are four aspects that can be considered: a) Parallelization/Scale-Out, already supported by UIMA-AS and friends b) Improvements in the current implementation. I know of at least four code fragments that can be improved c) Add new language constructs that are simply faster in some situations. I am thinking of an FST implementation similar to JAPE Plus and of an extension of the dynamic anchoring towards the operator plan optimization of SystemT d) Write faster rules. Some rules are just faster than others. This leads to a cookbook for best practices 2. Improve support for coreference information There are some nice ideas of unification-based grammars that can be included in the rule language. It does not have to be as mature as in SProUT, but maybe something like in CAFETIERE. This would automatically solve the restriction of value assignments in actions vs conditions 3. Support arbitrary CAS collections in the Ruta Workbench The Workbench currently only supports normal xmi files. There is no concept of a collection reader or similar stuff. It would maybe be nice for some users, if the Workbench can operate on CASs stored in a database or on any collection reader. 4. Actually useful rule induction algorithm After about six implementations of supervised rule learners, I think I have an idea of the layout of an actually useful algorithm for Ruta. I think it's also the time to adapt some ideas of semi-supervised machine learning for rule-based systems. 5. Support generic type systems in the Workbench Sometimes you cannot avoid specifying the semantics of an annotation in the feature values instead of in the type. However, most of the tooling will be not as useful then, e.g., the Annotation Browser view shows only one type with a lot of annotations. There should be some additional, configurable views that support those situations. All opinions or wishes are welcome :-) Best, Peter
