slight generalization...

For texts, the CAS == unit of work flowing in UIMA == (typically) a "document"

But, UIMA is used for other kinds of unstructured data, such as audio, video,
image, etc.  In this case the CAS == unit of work flowing in UIMA != a 
"document"...

We might want to consider more generic naming, because of this, like Jörn's
"CasId".  So in the following a name like CAS.setId() or CAS.setIdUri() might be
better (dropping "Document").

-Marshall


On 9/30/2011 10:59 AM, Richard Eckart de Castilho wrote:
> I always thought that a CAS.setDocumentUri() would have been helpful. In the 
> beginning I mistook setSofaDataUri() to be such a thing and was quite 
> surprise that if I set that, I cannot set the document text anymore. 
>
> So how about adding a setDocumentUri() method to CAS?
>
> From the experience with our own type system which supports such things, we 
> find that it is also very useful to have a documentBaseUri for cases where 
> recursive processing is taking place. I find a simple ID is not enough in 
> many cases, e.g. when recursively reading files from one directory and 
> writing them to another one while preserving the relative hierarchy.
>
> So a setDocumentBaseUri() in my opinion would also be desirable.
>
> Cheers,
>
> -- Richard
>
> Am 30.09.2011 um 16:53 schrieb Jörn Kottmann:
>
>> On 9/30/11 4:38 PM, Marshall Schor wrote:
>>> Can you say a bit more what this is?
>>>
>> Sure. The intent of the ID field is to reference a CAS instance to 
>> another system.
>>
>> Lets say we have an application where a UIMA analysis pipeline is used 
>> to process documents
>> which are stored in a database there you need to write the IDs of the 
>> documents into the CAS,
>> otherwise it is not possible to write analysis results back to the database.
>>
>> So typically your collection reader or first AE in the pipeline will set 
>> the ID and the last AE in the
>> pipeline will use it again to save the analysis results.
>>
>> Currently you always need to define a FS which holds your custom ID, but 
>> I guess a generic
>> string ID field would be just fine for almost any use case.
>>
>> Jörn

Reply via email to