Re: CAS Id

Jörn Kottmann Tue, 04 Oct 2011 02:34:42 -0700

On 10/3/11 3:37 PM, Eddie Epstein wrote:

As Marshall pointed out above, a CAS can have many CAS Views, each
with its own artifact. An analysis pipeline knows where these
artifacts come from and can set metadata appropriately, but a unique
ID for a stored copy of the CAS might best be determined by the
persistent CAS storage system where the CAS is to be stored.


To summarize what has been said.
A unique ID per CAS seems to be useful for logging (and debugging) in

user code, because the IDs logged by the framework can be related to IDslogged

by user code.

A CAS ID might not work in complex type systems which use multiplesviews, because

each sofa in a multi-view CAS might have a different source ID.

Beside that, there are UIMA pipelines which always store a complete CASobject in some kindof storage. There the CAS ID can just be the unique storage ID. Thiscould for example be a filesystem, or an HBase row key. As pointed out this might not work forcomplex cases, but could

be helpful for simpler UIMA pipelines.

Our Solrcas AE could also just use the CAS ID by default, if the userdoes not specify an Document ID

Feature Structure. In my applications this would actually work quite well.

More complex applications could also decide to use mime/type, featuresin a view as additionalinformation to complement the CAS ID in a newly created view, in orderto compute a storage ID.For example a UIMA pipeline which translates the input document text toenglish, and then stores thenew text in a new english view. The code can then compute an ID which isbased on the unique CAS ID.

In the end I believe a simple CAS ID field could be quite useful, fordebugging/logging, as adocument ID in simple UIMA pipelines and for applications which dealwith whole CASes(e.g. the Cas Editor based annotation tooling, or an AE which extracts"problematic" CASes

from an analysis pipeline for inspection).

To implement this I suggest that we extend to CAS interface with
CAS.setId(String) and CAS.getId() methods.

Jörn

Re: CAS Id

Reply via email to