Re: Requirements / Wish List for CAS Store?

Neal R Lewis Fri, 18 Jan 2013 14:18:28 -0800

> Thanks, that answers it, I think. I'll try to sum up the aspects related to 
> the specification of a CASStore to make sure I understood it correctly:


> - The AnalysisId is basically just another FS that can be queried for. 
    Almost, an FSID is a feature within FSes, and an AnalyticID is part of an 
FSID

> - It is supplied by the application using the CASStore and it is oblique to 
> the CASStore. 
   That's correct for the Analytic ID, but the FSID is very useful in the 
CASStore
'
> - The AnalysisId may contain IDs that the application obtains from the 
> CASStore, 
  such as the CAS ID and possibly the Collection ID, although this is not 
necessary.
   This is correct for the FSID

> - The CASStore would still work fine, even if a CAS did not contain any 
> AnalysisIds.
    
    The CASStore would need to know an FSID for querying.  Whether the 
AnalysisID is inside the FSID is irrelevant to the store, but very relevant to 
the user of the store. 

> So it is rather a technique an application would use than a feature of the 
> CASStore to support this AnalysisId. 
Is this a correct interpretation?  

   I think that an FSID should be a character of a CASStore, in that it 
maintains stable ids for feature structures  and CASes.  One alternative would 
be the xmi id from a serialized cas, but this must always be an integer, while 
and FSID allows for coarse or finegrained identification



-----Richard Eckart de Castilho <[email protected]> wrote: 
-----To: "<[email protected]>" <[email protected]>
From: Richard Eckart de Castilho <[email protected]>
Date: 01/16/2013 05:38AM
Subject: Re: Requirements / Wish List for CAS Store?

Am 14.01.2013 um 16:14 schrieb Neal R Lewis <[email protected]>:

>> The way you put it, it appears that XMI provides for fine grained queries 
>> while the binary CAS does not. However, there is no support for fine grained 
>> access in either formats (deliberately ignoring that XMI is an XML format 
>> and could be stored in an XML database providing for fine-grained access).
> 
> 
>> Are there additional requirements hidden in the FSID format? I could imagine:
> 
>> - ability to get all FSes produced by a certain annotator across all CASes 
>> in all collections or in a certain collection
> 
>> - ability to get all CASes in a collection
> 
>> - ability to get all CASes
> 
>> Where do you get the annotatorId from? I see no sensible way that the UIMA 
>> framework can provide such an ID. There is also a conflict potential. 
>> Consider if analysis engine A creates an FS and analysis engine B updates a 
>> primitive feature in that FS. Assuming that primitives do not get an FSID 
>> since they are not FSes, should the annotatorID of the FS be updated to B or 
>> should it remain A?
> 
> We currently assign our own annotator IDs.  Annotator IDs are meant to 
> distinguish between different  UIMA applications, not individual AEs in an 
> aggregrate or within the same pipeline.  I meant to use term AnalyticID, 
> which is more precise, so I'll use that from now on.  In the strictist sense, 
> you can imagine different Analytic IDS between different PEARs written by 
> separated developers for separate annoators, but run along the same CAS.  
> Now, a conflict might occur if they both had the same Type System and 
> annotated the same CAS. This would result in a duplicate annoation for that 
> type, but not for a FS because the FS would have a different FSID associated 
> with.
> 
> So, if only one uima application is performing an application, then the 
> AnalyticID can remain stable throughout operations, perhaps with a default 
> value if only one uima application is ran. 
> 
> A benefit of having a CAS store is more than archiving information, but to 
> track a CAS's trajectory through analytics. It allows analytics to be 
> developed disparitly.  When a new analytic is developed (and here, I mean a 
> new PEAR that will most likely run in a new JVM), we can run it on a old 
> collection of CASes.  If an analytic is small and only needs one or two 
> objects from the CAS, then we can reduce message size by retrieving only 
> those objects which are necessary.  FSIDs and XMI serialization allow us to 
> do this.
> 
> With FSIDs, there is a possibility to query for a particular element or 
> groups of elements, or even multiple CASes.  Some queries are more complex 
> than others (like getting all annotations from a particular annotator across 
> CASes) but still manageable.  I'll try to illustrate with an example from a 
> hypothetical deserialized CAS:
> 
> The following CAS was queried for an fsid with Collection ID of 3, artifcat 
> ID of 15, and AnalyticID of 6000.  The 6000 analytic looked for sentences 
> like "lvef of 30%" and annotated for the sentence and value of the lvef 
> (ignore for now the cop element, that is a an element we use to track 
> provenance):
> <?xml version="1.0" encoding="UTF-8" standalone="no"?>
> <xmi:XMI xmlns:xmi="http://www.omg.org/XMI"; 
> xmlns:cas="http:///uima/cas.ecore"; 
> xmlns:cdts="http:///org/test/health/cdts.ecore"; 
> xmlns:tcas="http:///uima/tcas.ecore"; xmi:version="2.0">
> <cdts:LVEF begin="449" value="20-25%" cop=".3.15.1000.1" end="490" 
> fsid=".3.15.6000.1" sofa="1" xmi:id="13"/>
> </xmi:XMI>
> 
> This Cas Fragment is given what we call a "transient View" or projection 
> during a preprocessing step before running through a PEAR in UIMAj (which we 
> externall assign an AnalyticID of 6003) that will look for the value of the 
> LVEF , map it to a term, filter for only new objects in the CAS,  and then 
> put back into the Store, where the store writes in a new fsid:
> 
> <?xml version="1.0" encoding="UTF-8" standalone="no"?>
> <xmi:XMI xmlns:xmi="http://www.omg.org/XMI"; 
> xmlns:cas="http:///uima/cas.ecore"; 
> xmlns:cdts="http:///org/test/health/cdts.ecore"; 
> xmlns:tcas="http:///uima/tcas.ecore"; xmi:version="2.0">
> <cdts:SeverelyDepressedLVEF begin="449" time="2001-12-31T12:00:00" 
> value="20-25%" cop=".3.15.6000.1" end="490" fsid=".3.15.6003.1" sofa="1" 
> xmi:id="13"/>
> </xmi:XMI>
> 
> This is what I mean by fine grained queries using FSID.   We can also image a 
> coarse query for a Collection 3, CAS 15 (fsid like '.3.15.%') , which will 
> produce a full CAS.
> 
> Does this answer some of your question?

Thanks, that answers it, I think. I'll try to sum up the aspects related to the 
specification of a CASStore to make sure I understood it correctly:

- The AnalysisId is basically just another FS that can be queried for. 
- It is supplied by the application using the CASStore and it is oblique to the 
CASStore. 
- The AnalysisId may contain IDs that the application obtains from the 
CASStore, 
  such as the CAS ID and possibly the Collection ID, although this is not 
necessary.
- The CASStore would still work fine, even if a CAS did not contain any 
AnalysisIds.

So it is rather a technique an application would use than a feature of the 
CASStore to support this AnalysisId. 
Is this a correct interpretation?

Best,

-- Richard

-- 
------------------------------------------------------------------- 
Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab (UKP-TUD) 
FB 20 Computer Science Department      
Technische Universität Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117
[email protected] 
www.ukp.tu-darmstadt.de 
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de
-------------------------------------------------------------------

Re: Requirements / Wish List for CAS Store?

Reply via email to