On Tue, Oct 4, 2011 at 5:34 AM, Jörn Kottmann <[email protected]> wrote: > In the end I believe a simple CAS ID field could be quite useful, for > debugging/logging, as a > document ID in simple UIMA pipelines and for applications which deal with > whole CASes > (e.g. the Cas Editor based annotation tooling, or an AE which extracts > "problematic" CASes > from an analysis pipeline for inspection). > > To implement this I suggest that we extend to CAS interface with > CAS.setId(String) and CAS.getId() methods.
Historically in UIMA this document ID info is saved in the SourceDocumentInformation annotation, in the uri feature. Many UIMA SDK samples rely on the ID here. When applications want additional metadata they then add features to the SourceDocumentInformation type definition for that purpose. If one were to implement CAS.setID() the data should be stored in the CAS as a type/feature so that all of the different CAS serialization and transport mechanisms are unchanged. Probably as an additional feature in SofaFS would be best. Presumably this string would want to be immutable (as are other SofaFS features)? Still not clear to me that this feature adds value beyond application specific type system data. Eddie
