Hi Peter,

I was very impressed when you showed me a demo of TextMarker
last year, so I think it's great you're coming up with this
proposal.  I will download and play with it over the coming
few weeks, but I'll probably be really busy before Xmas, so
it might take a while...

If we decide to accept TextMarker into UIMA, we will need a
code grant: http://www.apache.org/licenses/software-grant.txt
I assume your university owns the rights to all the code, so
you may want to bring this up with your legal department.  I
know it's a bit early, but I'm bringing this up now because
there may be some lead time.

More questions and comments inline.

On 12/14/2010 15:55, Peter Klügl wrote:
>  Hello,
> 
> We would like to contribute our TextMarker system to Apache UIMA and want to
> ask, if the development team is interested in this contribution. The system is
> currently hosted on SourceForge (http://sourceforge.net/projects/textmarker/)
> and there is some documentation in the project wiki
> (http://tmwiki.informatik.uni-wuerzburg.de/).
> 
> I think it's a good start for that discussion, if I summarize the current 
> status
> of the system. TextMarker is an Eclipse-based tool implemented in pure Java 
> that
> can among other things be used to prototype analysis engines or develop 
> complex
> handcrafted text processing applications. It consists of four major parts:
> 
> Language:
> The rule or rather script language can be compared to regular expressions over
> annotation with additional conditions and actions. There are currently 28
> different conditions and 34 actions. They range from a test on a feature value
> to a test, if the matched annotation is contained in another annotation of a
> given type, respectively from creating an annotation to applying an external
> dictionary or analysis engine. A TextMarker script can import type systems or
> define new types or variables. Then, there are also some more complex control
> structures for procedure calls, conditioned statements or recursion. The
> TextMarker language (and inference) is in active usage in some productive
> applications here, but it lacks of test cases. However, we are currently 
> writing
> uimaFIT based component test to improve the quality management.

So just to make sure I understand this correctly: the
language is completely independent of the Eclipse based
development environment.  I could in principle write
rules with just a text editor, if I wanted to.  Correct?

I think such a language is a very important feature
that UIMA is currently missing.  We have nothing that
compares with GATE's JAPE language, for example.

> 
> Workbench:
> The Eclipse-based tool for developing the TextMarker scripts is currently 
> based
> on DLTK 1.0 (http://www.eclipse.org/dltk/) and it's editor supports syntax
> highlighting, syntax checks, context-sensitive auto-completion, formatting, 
> mark
> occurrences, open declaration and some other useful stuff commonly known in
> IDEs. For each script file, a type system and an executable analysis engine is
> created. Therefore, it's quite simple and efficient to create an analysis 
> engine
> with a few lines of TextMarker rules. The workbench supports testing on
> annotated xmiCas while writing new rules and provides some minimal debugging
> functionality that explains why and on what text a rule was executed.

Cool, I've been looking into DLTK myself recently.  Great stuff.

> 
> CEV:
> This plugin can be used to edit or visualize xmiCAS and is also able to render
> HTML. It is heavily used by the testing and explanation components.

So here we'd have to figure out if it would make
sense to unify it with our CAS Editor.

> 
> TextRuler:
> This framework for rule learning is rather a playground and mainly implemented
> by students. There are currently more or less working implementations of LP2,
> WHISK, WIEN, RAPIER and an own algorithm, and three other algorithms are being
> implemented.

Sounds interesting.

> 
> 
> Overall, the system is working stable for a year now, but lacks in code 
> quality,
> documentation and test cases. Basically, we are also willing to change the 
> name
> of the system, if someone can think of a better one.
> 
> I'm looking forward to your comments.
> 
> Best regards,
> 
> Peter
> 
> 

--Thilo

Reply via email to