Am 18.11.2012 um 09:10 schrieb Tommaso Teofili <[email protected]>:

> Another thing I'd like to add to our website / documentation is something
> like "how to do NLP task X within Apache UIMA"; we currently have a list of
> external sources plus the addons page but I think it'd be nice for a user
> to know which known open source UIMA enabled options exist for doing e.g.
> language detection, tokenization, etc either in the addons package or from
> some other sources.
> 
> What do you think?

I'm not sure how much detail can be useful here, because the whole thing has 
grown so huge meanwhile. I think he external resources page is quite good, but 
going down to the component level may just be a bit too much. To you give you a 
few numbers just from DKPro Core:

- ~17 tools are wrapped for UIMA in DKPro Core trunk. A good part of these
  provide more than one component! (e.g. Clear NLP, Stanford NLP, OpenNLP, …).
  In addition to those ~17, there are also several original components. 
  Unfortunately, I didn't come up with an easy way to count the actual
  components, but I would guess something like 30+.

- ~17 module with readers and writers for various formats are provided in
  DKPro Core trunk.

- 62 artifacts are returned on a Maven Central search [1] for DKPro Core
  1.4.0. I was admittedly a bit shocked when I noticed this recently. In 
Eclipse,
  I ususally don't count the stuff. The upcoming DKPro Core version will have 
even
  more than that.

- 81 different models have been packaged for the various tools in various
  languages and distribute them via Maven [2]. There are a couple more available
  for the TreeTagger module, but due to license reasons we can only provide a
  script for people to package them themselves.

... and this is only DKPro Core alone, not to mention the UIMA Sandbox, Clear 
TK, cTAKES and whatnot. Listing them all on a component-level, I think, would 
make a huge list! 

-- Richard

[1] http://search.maven.org/#search%7Cga%7C1%7Cdkpro
[2] 
https://docs.google.com/spreadsheet/pub?key=0ApGcdapz0xSYdGh2azY2ODMtZDRNczUySEZJUFpXM2c

-- 
------------------------------------------------------------------- 
Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab (UKP-TUD) 
FB 20 Computer Science Department      
Technische Universität Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117
[email protected] 
www.ukp.tu-darmstadt.de 
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de
------------------------------------------------------------------- 






Reply via email to