Re: TextMarker

Thilo Goetz Sat, 01 Jan 2011 04:42:20 -0800

Hi Peter,

I downloaded the source trunk and got things mostly to compile
and run: I'm running Eclipse 3.5.2, RCP edition, and installed
the latest UIMA plugins and DLTK 1.0.2.  I also had to find the
Mozilla xpcom plugin.  The only thing not compiling for me are
references to com.sun.org.apache.apache.xpath.XPathAPI.  The
internet tells me that those could be fixed by using Xalan
directly, but I haven't tried.


My main issue right now is that the TextMarker wiki is down,
and that seems to be the only source of documentation (unless
I missed something).

I noticed that TextMarker uses a lot of 3rd party libraries.
So we'll need to compile an exhaustive list of the the libs
that are being used, their licenses and provenance, and in
case the license is bad, possible alternatives.

--Thilo

On 12/14/2010 15:55, Peter Klügl wrote:

Hello,

We would like to contribute our TextMarker system to Apache UIMA and
want to ask, if the development team is interested in this contribution.
The system is currently hosted on SourceForge
(http://sourceforge.net/projects/textmarker/) and there is some
documentation in the project wiki
(http://tmwiki.informatik.uni-wuerzburg.de/).

I think it's a good start for that discussion, if I summarize the
current status of the system. TextMarker is an Eclipse-based tool
implemented in pure Java that can among other things be used to
prototype analysis engines or develop complex handcrafted text
processing applications. It consists of four major parts:

Language:
The rule or rather script language can be compared to regular
expressions over annotation with additional conditions and actions.
There are currently 28 different conditions and 34 actions. They range
from a test on a feature value to a test, if the matched annotation is
contained in another annotation of a given type, respectively from
creating an annotation to applying an external dictionary or analysis
engine. A TextMarker script can import type systems or define new types
or variables. Then, there are also some more complex control structures
for procedure calls, conditioned statements or recursion. The TextMarker
language (and inference) is in active usage in some productive
applications here, but it lacks of test cases. However, we are currently
writing uimaFIT based component test to improve the quality management.

Workbench:
The Eclipse-based tool for developing the TextMarker scripts is
currently based on DLTK 1.0 (http://www.eclipse.org/dltk/) and it's
editor supports syntax highlighting, syntax checks, context-sensitive
auto-completion, formatting, mark occurrences, open declaration and some
other useful stuff commonly known in IDEs. For each script file, a type
system and an executable analysis engine is created. Therefore, it's
quite simple and efficient to create an analysis engine with a few lines
of TextMarker rules. The workbench supports testing on annotated xmiCas
while writing new rules and provides some minimal debugging
functionality that explains why and on what text a rule was executed.

CEV:
This plugin can be used to edit or visualize xmiCAS and is also able to
render HTML. It is heavily used by the testing and explanation components.

TextRuler:
This framework for rule learning is rather a playground and mainly
implemented by students. There are currently more or less working
implementations of LP2, WHISK, WIEN, RAPIER and an own algorithm, and
three other algorithms are being implemented.


Overall, the system is working stable for a year now, but lacks in code
quality, documentation and test cases. Basically, we are also willing to
change the name of the system, if someone can think of a better one.

I'm looking forward to your comments.

Best regards,

Peter

Re: TextMarker

Reply via email to