TextMarker

Peter Klügl Tue, 14 Dec 2010 07:11:05 -0800

 Hello,

We would like to contribute our TextMarker system to Apache UIMA andwant to ask, if the development team is interested in this contribution.The system is currently hosted on SourceForge(http://sourceforge.net/projects/textmarker/) and there is somedocumentation in the project wiki(http://tmwiki.informatik.uni-wuerzburg.de/).

I think it's a good start for that discussion, if I summarize thecurrent status of the system. TextMarker is an Eclipse-based toolimplemented in pure Java that can among other things be used toprototype analysis engines or develop complex handcrafted textprocessing applications. It consists of four major parts:


Language:

The rule or rather script language can be compared to regularexpressions over annotation with additional conditions and actions.There are currently 28 different conditions and 34 actions. They rangefrom a test on a feature value to a test, if the matched annotation iscontained in another annotation of a given type, respectively fromcreating an annotation to applying an external dictionary or analysisengine. A TextMarker script can import type systems or define new typesor variables. Then, there are also some more complex control structuresfor procedure calls, conditioned statements or recursion. The TextMarkerlanguage (and inference) is in active usage in some productiveapplications here, but it lacks of test cases. However, we are currentlywriting uimaFIT based component test to improve the quality management.


Workbench:

The Eclipse-based tool for developing the TextMarker scripts iscurrently based on DLTK 1.0 (http://www.eclipse.org/dltk/) and it'seditor supports syntax highlighting, syntax checks, context-sensitiveauto-completion, formatting, mark occurrences, open declaration and someother useful stuff commonly known in IDEs. For each script file, a typesystem and an executable analysis engine is created. Therefore, it'squite simple and efficient to create an analysis engine with a few linesof TextMarker rules. The workbench supports testing on annotated xmiCaswhile writing new rules and provides some minimal debuggingfunctionality that explains why and on what text a rule was executed.


CEV:

This plugin can be used to edit or visualize xmiCAS and is also able torender HTML. It is heavily used by the testing and explanation components.


TextRuler:

This framework for rule learning is rather a playground and mainlyimplemented by students. There are currently more or less workingimplementations of LP2, WHISK, WIEN, RAPIER and an own algorithm, andthree other algorithms are being implemented.

Overall, the system is working stable for a year now, but lacks in codequality, documentation and test cases. Basically, we are also willing tochange the name of the system, if someone can think of a better one.


I'm looking forward to your comments.

Best regards,

Peter


--
---------------------------------------------------------------------
Dipl.-Inf. Peter Klügl
Universität Würzburg        Tel.: +49-(0)931-31-86741
Am Hubland                  Fax.: +49-(0)931-31-86732
97074 Würzburg              mail: [email protected]
     http://www.is.informatik.uni-wuerzburg.de/en/staff/kluegl_peter/
---------------------------------------------------------------------

TextMarker

Reply via email to