Am 18.11.2012 um 09:10 schrieb Tommaso Teofili <[email protected]>:
> Another thing I'd like to add to our website / documentation is something > like "how to do NLP task X within Apache UIMA"; we currently have a list of > external sources plus the addons page but I think it'd be nice for a user > to know which known open source UIMA enabled options exist for doing e.g. > language detection, tokenization, etc either in the addons package or from > some other sources. > > What do you think? I'm not sure how much detail can be useful here, because the whole thing has grown so huge meanwhile. I think he external resources page is quite good, but going down to the component level may just be a bit too much. To you give you a few numbers just from DKPro Core: - ~17 tools are wrapped for UIMA in DKPro Core trunk. A good part of these provide more than one component! (e.g. Clear NLP, Stanford NLP, OpenNLP, …). In addition to those ~17, there are also several original components. Unfortunately, I didn't come up with an easy way to count the actual components, but I would guess something like 30+. - ~17 module with readers and writers for various formats are provided in DKPro Core trunk. - 62 artifacts are returned on a Maven Central search [1] for DKPro Core 1.4.0. I was admittedly a bit shocked when I noticed this recently. In Eclipse, I ususally don't count the stuff. The upcoming DKPro Core version will have even more than that. - 81 different models have been packaged for the various tools in various languages and distribute them via Maven [2]. There are a couple more available for the TreeTagger module, but due to license reasons we can only provide a script for people to package them themselves. ... and this is only DKPro Core alone, not to mention the UIMA Sandbox, Clear TK, cTAKES and whatnot. Listing them all on a component-level, I think, would make a huge list! -- Richard [1] http://search.maven.org/#search%7Cga%7C1%7Cdkpro [2] https://docs.google.com/spreadsheet/pub?key=0ApGcdapz0xSYdGh2azY2ODMtZDRNczUySEZJUFpXM2c -- ------------------------------------------------------------------- Richard Eckart de Castilho Technical Lead Ubiquitous Knowledge Processing Lab (UKP-TUD) FB 20 Computer Science Department Technische Universität Darmstadt Hochschulstr. 10, D-64289 Darmstadt, Germany phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117 [email protected] www.ukp.tu-darmstadt.de Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de -------------------------------------------------------------------
