On 10/4/11 9:41 PM, Eddie Epstein wrote:
Historically in UIMA this document ID info is saved in the SourceDocumentInformation annotation, in the uri feature. Many UIMA SDK samples rely on the ID here. When applications want additional metadata they then add features to the SourceDocumentInformation type definition for that purpose.
I usually define my own Document Id Feature Structure which contains my unique id. I always thought that is a bit cumbersome to use, and wondered if having an ID field per CAS might help, but it sounds like that there are good reasons why it was never implemented. In the OpenNLP UIMA Integration I have AEs which can do the training of the components, one issue there is that it is hard to map log messages to the actual CAS where was caused by. To solve this I will now just add a type mapping so a user can configure his custom Id Feature Structure type and feature. Jörn
