slight generalization... For texts, the CAS == unit of work flowing in UIMA == (typically) a "document"
But, UIMA is used for other kinds of unstructured data, such as audio, video, image, etc. In this case the CAS == unit of work flowing in UIMA != a "document"... We might want to consider more generic naming, because of this, like Jörn's "CasId". So in the following a name like CAS.setId() or CAS.setIdUri() might be better (dropping "Document"). -Marshall On 9/30/2011 10:59 AM, Richard Eckart de Castilho wrote: > I always thought that a CAS.setDocumentUri() would have been helpful. In the > beginning I mistook setSofaDataUri() to be such a thing and was quite > surprise that if I set that, I cannot set the document text anymore. > > So how about adding a setDocumentUri() method to CAS? > > From the experience with our own type system which supports such things, we > find that it is also very useful to have a documentBaseUri for cases where > recursive processing is taking place. I find a simple ID is not enough in > many cases, e.g. when recursively reading files from one directory and > writing them to another one while preserving the relative hierarchy. > > So a setDocumentBaseUri() in my opinion would also be desirable. > > Cheers, > > -- Richard > > Am 30.09.2011 um 16:53 schrieb Jörn Kottmann: > >> On 9/30/11 4:38 PM, Marshall Schor wrote: >>> Can you say a bit more what this is? >>> >> Sure. The intent of the ID field is to reference a CAS instance to >> another system. >> >> Lets say we have an application where a UIMA analysis pipeline is used >> to process documents >> which are stored in a database there you need to write the IDs of the >> documents into the CAS, >> otherwise it is not possible to write analysis results back to the database. >> >> So typically your collection reader or first AE in the pipeline will set >> the ID and the last AE in the >> pipeline will use it again to save the analysis results. >> >> Currently you always need to define a FS which holds your custom ID, but >> I guess a generic >> string ID field would be just fine for almost any use case. >> >> Jörn
