UIMA and BSP

Tommaso Teofili Thu, 17 May 2012 00:26:59 -0700

Hi all,

recently I've been playing (and coding) with BSP [1] based algorithms using
Apache Hama [2] (which officially graduated to TLP yesterday) and I found
that in many cases there were significant performance boosts with respect
to a "plain" MapReduce based algorithm, so I thought it would have made
sense to write a UIMA collection processing algorithm using Hama.


I started sketching it up on a sample project on GitHub [3] but I think it
would make sense to put it on our sandbox so that anyone can have a
look/use/improve/evaluate it.
The current implementation I have just reads files from a directory inside
the filesystem, process them in parallel and collects the ProcessTraces
inside an output file but my idea is that it may come just as a new CPM
implementation reading and writing from/to HDFS.
I know it's a lot of things in few lines so feel free to ask for more
clarifications.

Have a nice day,
Tommaso

[1] : http://en.wikipedia.org/wiki/Bulk_synchronous_parallel
[2] : http://incubator.apache.org/hama
[3] :
https://github.com/tteofili/samplett/blob/master/uima-bsp/src/main/java/com/github/samplett/uima/bsp/AEProcessingBSPJob.java

UIMA and BSP

Reply via email to