Hi, UIMA Developers, I've been working on an extension (with some other people) to UIMA that I'd like to bring into the sandbox, if others agree it would be a good addition. We've been calling it DUCC, for Distributed UIMA Cluster Computing. Before going into details, here's a high-level description:
DUCC is a cluster management system providing tooling, management, and scheduling facilities to automate the scale-out of applications written to the UIMA framework. Core UIMA provides a generalized framework for applications that process unstructured information such as human language, but does not provide a scale-out mechanism. UIMA-AS provides a scale-out mechanism to distribute UIMA pipelines over a cluster of computing resources, but does not provide job or cluster management of the resources. DUCC completes the set by providing job support, cluster management, and automation for the scale-out of UIMA applications over UIMA-AS on large computing clusters. We have an initial implementation that has been used by one project; we'd like to move this into the UIMA project for further development, and to make it available to others. Do you think this would be a worthwhile addition, and does it make sense to bring it (initially) into the Sandbox?
