Re: UIMA and BSP

Jens Grivolla Thu, 17 May 2012 02:38:23 -0700

Hi Tommaso,

as I understand it each CAS is processed independently and withoutparallelization, right? If so, what you are doing does not look thatmuch like MapReduce (since you don't reduce) but is closer to justrunning many parallel instances on subsets of the collection.

We are currently using Sun Grid Engine to launch CPE instances onseveral nodes, getting the input data (in plain text or XMI format) froma MySQL database and writing XMI output to the DB. That way we avoidsynchronization issues and can distribute data between instances withthe simple modulo trick in the SELECT query.

We also tried using UIMA AS, but the overhead seemed very big. Maybe byjust having fully colocated aggregates, each working on one CAS frombeginning to end it wouldn't be too bad, then we would just have onecentral CollectionReader that dispatches to the different aggregates.You don't seem to parallelize within the processing flow, so that'squite close to what your example does, isn't it?


Bye,
Jens

On 05/17/2012 09:25 AM, Tommaso Teofili wrote:

Hi all,

recently I've been playing (and coding) with BSP [1] based algorithms using
Apache Hama [2] (which officially graduated to TLP yesterday) and I found
that in many cases there were significant performance boosts with respect
to a "plain" MapReduce based algorithm, so I thought it would have made
sense to write a UIMA collection processing algorithm using Hama.

I started sketching it up on a sample project on GitHub [3] but I think it
would make sense to put it on our sandbox so that anyone can have a
look/use/improve/evaluate it.
The current implementation I have just reads files from a directory inside
the filesystem, process them in parallel and collects the ProcessTraces
inside an output file but my idea is that it may come just as a new CPM
implementation reading and writing from/to HDFS.
I know it's a lot of things in few lines so feel free to ask for more
clarifications.

Have a nice day,
Tommaso

[1] : http://en.wikipedia.org/wiki/Bulk_synchronous_parallel
[2] : http://incubator.apache.org/hama
[3] :
https://github.com/tteofili/samplett/blob/master/uima-bsp/src/main/java/com/github/samplett/uima/bsp/AEProcessingBSPJob.java

Re: UIMA and BSP

Reply via email to