Application skeleton for multi-modal NLP analysis There has been renewed interest in the typesystem and annotators developed as part of the Darpa GALE project to demonstrate how to combine analytics from multiple sources and modalities. The GALE Interoperabilty Demonstration system (IOD) uses UIMA-AS to interconnect 11 different types of NLP analytics distributed over 7 research facilities in 3 countries to transcribe, translate, and extract information from foreign language news broadcasts.
To aid the development of other multi-modal applications I plan to publish a skeleton of this application in the sandbox. It will eventually include the following: - the UIMA typesystem that was developed to allow each analytic to operate on an appropriate view of the data with no dependencies on its origin, - simulated analytics for the NLP engines, - data reorganization annotators that convert the outputs of one analytic into a form suitable for input to another, - descriptors and a flow controller that use the features of UIMA-AS to run similar analytics in parallel, and to scale-out the slowest components. The goal will be to provide a complete example of a system that converts audio in one language to text in another, segmented into topics. Although no real NLP analytics will be included, users can use the simulated ones as examples of how to use the typesystem to wrap an NLP analytic as a UIMA annotator. Any comments and suggestions would be welcome ... I hope to get started next week. Burn
