Hi Peter, I was very impressed when you showed me a demo of TextMarker last year, so I think it's great you're coming up with this proposal. I will download and play with it over the coming few weeks, but I'll probably be really busy before Xmas, so it might take a while...
If we decide to accept TextMarker into UIMA, we will need a code grant: http://www.apache.org/licenses/software-grant.txt I assume your university owns the rights to all the code, so you may want to bring this up with your legal department. I know it's a bit early, but I'm bringing this up now because there may be some lead time. More questions and comments inline. On 12/14/2010 15:55, Peter Klügl wrote: > Hello, > > We would like to contribute our TextMarker system to Apache UIMA and want to > ask, if the development team is interested in this contribution. The system is > currently hosted on SourceForge (http://sourceforge.net/projects/textmarker/) > and there is some documentation in the project wiki > (http://tmwiki.informatik.uni-wuerzburg.de/). > > I think it's a good start for that discussion, if I summarize the current > status > of the system. TextMarker is an Eclipse-based tool implemented in pure Java > that > can among other things be used to prototype analysis engines or develop > complex > handcrafted text processing applications. It consists of four major parts: > > Language: > The rule or rather script language can be compared to regular expressions over > annotation with additional conditions and actions. There are currently 28 > different conditions and 34 actions. They range from a test on a feature value > to a test, if the matched annotation is contained in another annotation of a > given type, respectively from creating an annotation to applying an external > dictionary or analysis engine. A TextMarker script can import type systems or > define new types or variables. Then, there are also some more complex control > structures for procedure calls, conditioned statements or recursion. The > TextMarker language (and inference) is in active usage in some productive > applications here, but it lacks of test cases. However, we are currently > writing > uimaFIT based component test to improve the quality management. So just to make sure I understand this correctly: the language is completely independent of the Eclipse based development environment. I could in principle write rules with just a text editor, if I wanted to. Correct? I think such a language is a very important feature that UIMA is currently missing. We have nothing that compares with GATE's JAPE language, for example. > > Workbench: > The Eclipse-based tool for developing the TextMarker scripts is currently > based > on DLTK 1.0 (http://www.eclipse.org/dltk/) and it's editor supports syntax > highlighting, syntax checks, context-sensitive auto-completion, formatting, > mark > occurrences, open declaration and some other useful stuff commonly known in > IDEs. For each script file, a type system and an executable analysis engine is > created. Therefore, it's quite simple and efficient to create an analysis > engine > with a few lines of TextMarker rules. The workbench supports testing on > annotated xmiCas while writing new rules and provides some minimal debugging > functionality that explains why and on what text a rule was executed. Cool, I've been looking into DLTK myself recently. Great stuff. > > CEV: > This plugin can be used to edit or visualize xmiCAS and is also able to render > HTML. It is heavily used by the testing and explanation components. So here we'd have to figure out if it would make sense to unify it with our CAS Editor. > > TextRuler: > This framework for rule learning is rather a playground and mainly implemented > by students. There are currently more or less working implementations of LP2, > WHISK, WIEN, RAPIER and an own algorithm, and three other algorithms are being > implemented. Sounds interesting. > > > Overall, the system is working stable for a year now, but lacks in code > quality, > documentation and test cases. Basically, we are also willing to change the > name > of the system, if someone can think of a better one. > > I'm looking forward to your comments. > > Best regards, > > Peter > > --Thilo
