Application skeleton for multi-modal NLP analysis

There has been renewed interest in the typesystem and annotators developed
as part of the Darpa GALE project to demonstrate how to combine analytics
from multiple sources and modalities.  The GALE Interoperabilty
Demonstration system (IOD) uses UIMA-AS to interconnect 11 different types
of NLP analytics distributed over 7 research facilities in 3 countries to
transcribe, translate, and extract information from foreign language news
broadcasts.

To aid the development of other multi-modal applications I plan to publish a
skeleton of this application in the sandbox.  It will eventually include the
following:
 - the UIMA typesystem that was developed to allow each analytic to operate
on an appropriate view of the data with no dependencies on its origin,
 - simulated analytics for the NLP engines,
 - data reorganization annotators that convert the outputs of one analytic
into a form suitable for  input to another,
 - descriptors and a flow controller that use the features of UIMA-AS to run
similar analytics in parallel, and to scale-out the slowest components.

The goal will be to provide a complete example of a system that converts
audio in one language to text in another, segmented into topics.  Although
no real NLP analytics will be included, users can use the simulated ones as
examples of how to use the typesystem to wrap an NLP analytic as a UIMA
annotator.

Any comments and suggestions would be welcome ... I hope to get started next
week.

Burn

Reply via email to