On 11/24/2010 10:32, Jörn Kottmann wrote: > On 11/23/10 11:44 PM, Burn Lewis wrote: >> Application skeleton for multi-modal NLP analysis >> >> There has been renewed interest in the typesystem and annotators developed >> as part of the Darpa GALE project to demonstrate how to combine analytics >> from multiple sources and modalities. The GALE Interoperabilty >> Demonstration system (IOD) uses UIMA-AS to interconnect 11 different types >> of NLP analytics distributed over 7 research facilities in 3 countries to >> transcribe, translate, and extract information from foreign language news >> broadcasts. >> >> To aid the development of other multi-modal applications I plan to publish a >> skeleton of this application in the sandbox. It will eventually include the >> following: >> - the UIMA typesystem that was developed to allow each analytic to operate >> on an appropriate view of the data with no dependencies on its origin, >> - simulated analytics for the NLP engines, >> - data reorganization annotators that convert the outputs of one analytic >> into a form suitable for input to another, >> - descriptors and a flow controller that use the features of UIMA-AS to run >> similar analytics in parallel, and to scale-out the slowest components. >> >> The goal will be to provide a complete example of a system that converts >> audio in one language to text in another, segmented into topics. Although >> no real NLP analytics will be included, users can use the simulated ones as >> examples of how to use the typesystem to wrap an NLP analytic as a UIMA >> annotator. >> >> Any comments and suggestions would be welcome ... I hope to get started next >> week. > > Sounds very interesting, having such a sample will help people to understand > how UIMA can be used with non-text sofas and how to write AEs for these. > I think that is one of the reasons why there is no (open source) integration > for > speech > recognition or OCR. BTW, both are available in a compatible license, CMU > Sphinx > and Tesseract/Ocropus.
Sounds very interesting, and will be even better when it integrates those OS libraries to create a working solution. --Thilo > > Jörn >
