On 10/3/11 3:37 PM, Eddie Epstein wrote:
As Marshall pointed out above, a CAS can have many CAS Views, each
with its own artifact. An analysis pipeline knows where these
artifacts come from and can set metadata appropriately, but a unique
ID for a stored copy of the CAS might best be determined by the
persistent CAS storage system where the CAS is to be stored.

To summarize what has been said.
A unique ID per CAS seems to be useful for logging (and debugging) in
user code, because the IDs logged by the framework can be related to IDs logged
by user code.
A CAS ID might not work in complex type systems which use multiples views, because
each sofa in a multi-view CAS might have a different source ID.

Beside that, there are UIMA pipelines which always store a complete CAS object in some kind of storage. There the CAS ID can just be the unique storage ID. This could for example be a file system, or an HBase row key. As pointed out this might not work for complex cases, but could
be helpful for simpler UIMA pipelines.

Our Solrcas AE could also just use the CAS ID by default, if the user does not specify an Document ID
Feature Structure. In my applications this would actually work quite well.

More complex applications could also decide to use mime/type, features in a view as additional information to complement the CAS ID in a newly created view, in order to compute a storage ID. For example a UIMA pipeline which translates the input document text to english, and then stores the new text in a new english view. The code can then compute an ID which is based on the unique CAS ID.

In the end I believe a simple CAS ID field could be quite useful, for debugging/logging, as a document ID in simple UIMA pipelines and for applications which deal with whole CASes (e.g. the Cas Editor based annotation tooling, or an AE which extracts "problematic" CASes
from an analysis pipeline for inspection).

To implement this I suggest that we extend to CAS interface with
CAS.setId(String) and CAS.getId() methods.

Jörn

Reply via email to