Re: Requirements / Wish List for CAS Store?

Richard Eckart de Castilho Wed, 16 Jan 2013 05:38:37 -0800

Am 14.01.2013 um 16:14 schrieb Neal R Lewis <[email protected]>:

>> The way you put it, it appears that XMI provides for fine grained queries 
>> while the binary CAS does not. However, there is no support for fine grained 
>> access in either formats (deliberately ignoring that XMI is an XML format 
>> and could be stored in an XML database providing for fine-grained access).
> 
> 
>> Are there additional requirements hidden in the FSID format? I could imagine:
> 
>> - ability to get all FSes produced by a certain annotator across all CASes 
>> in all collections or in a certain collection
> 
>> - ability to get all CASes in a collection
> 
>> - ability to get all CASes
> 
>> Where do you get the annotatorId from? I see no sensible way that the UIMA 
>> framework can provide such an ID. There is also a conflict potential. 
>> Consider if analysis engine A creates an FS and analysis engine B updates a 
>> primitive feature in that FS. Assuming that primitives do not get an FSID 
>> since they are not FSes, should the annotatorID of the FS be updated to B or 
>> should it remain A?
> 
> We currently assign our own annotator IDs.  Annotator IDs are meant to 
> distinguish between different  UIMA applications, not individual AEs in an 
> aggregrate or within the same pipeline.  I meant to use term AnalyticID, 
> which is more precise, so I'll use that from now on.  In the strictist sense, 
> you can imagine different Analytic IDS between different PEARs written by 
> separated developers for separate annoators, but run along the same CAS.  
> Now, a conflict might occur if they both had the same Type System and 
> annotated the same CAS. This would result in a duplicate annoation for that 
> type, but not for a FS because the FS would have a different FSID associated 
> with.
> 
> So, if only one uima application is performing an application, then the 
> AnalyticID can remain stable throughout operations, perhaps with a default 
> value if only one uima application is ran. 
> 
> A benefit of having a CAS store is more than archiving information, but to 
> track a CAS's trajectory through analytics. It allows analytics to be 
> developed disparitly.  When a new analytic is developed (and here, I mean a 
> new PEAR that will most likely run in a new JVM), we can run it on a old 
> collection of CASes.  If an analytic is small and only needs one or two 
> objects from the CAS, then we can reduce message size by retrieving only 
> those objects which are necessary.  FSIDs and XMI serialization allow us to 
> do this.
> 
> With FSIDs, there is a possibility to query for a particular element or 
> groups of elements, or even multiple CASes.  Some queries are more complex 
> than others (like getting all annotations from a particular annotator across 
> CASes) but still manageable.  I'll try to illustrate with an example from a 
> hypothetical deserialized CAS:
> 
> The following CAS was queried for an fsid with Collection ID of 3, artifcat 
> ID of 15, and AnalyticID of 6000.  The 6000 analytic looked for sentences 
> like "lvef of 30%" and annotated for the sentence and value of the lvef 
> (ignore for now the cop element, that is a an element we use to track 
> provenance):
> <?xml version="1.0" encoding="UTF-8" standalone="no"?>
> <xmi:XMI xmlns:xmi="http://www.omg.org/XMI"; 
> xmlns:cas="http:///uima/cas.ecore"; 
> xmlns:cdts="http:///org/test/health/cdts.ecore"; 
> xmlns:tcas="http:///uima/tcas.ecore"; xmi:version="2.0">
> <cdts:LVEF begin="449" value="20-25%" cop=".3.15.1000.1" end="490" 
> fsid=".3.15.6000.1" sofa="1" xmi:id="13"/>
> </xmi:XMI>
> 
> This Cas Fragment is given what we call a "transient View" or projection 
> during a preprocessing step before running through a PEAR in UIMAj (which we 
> externall assign an AnalyticID of 6003) that will look for the value of the 
> LVEF , map it to a term, filter for only new objects in the CAS,  and then 
> put back into the Store, where the store writes in a new fsid:
> 
> <?xml version="1.0" encoding="UTF-8" standalone="no"?>
> <xmi:XMI xmlns:xmi="http://www.omg.org/XMI"; 
> xmlns:cas="http:///uima/cas.ecore"; 
> xmlns:cdts="http:///org/test/health/cdts.ecore"; 
> xmlns:tcas="http:///uima/tcas.ecore"; xmi:version="2.0">
> <cdts:SeverelyDepressedLVEF begin="449" time="2001-12-31T12:00:00" 
> value="20-25%" cop=".3.15.6000.1" end="490" fsid=".3.15.6003.1" sofa="1" 
> xmi:id="13"/>
> </xmi:XMI>
> 
> This is what I mean by fine grained queries using FSID.   We can also image a 
> coarse query for a Collection 3, CAS 15 (fsid like '.3.15.%') , which will 
> produce a full CAS.
> 
> Does this answer some of your question?


Thanks, that answers it, I think. I'll try to sum up the aspects related to the 
specification of a CASStore to make sure I understood it correctly:

- The AnalysisId is basically just another FS that can be queried for. 
- It is supplied by the application using the CASStore and it is oblique to the 
CASStore. 
- The AnalysisId may contain IDs that the application obtains from the 
CASStore, 
  such as the CAS ID and possibly the Collection ID, although this is not 
necessary.
- The CASStore would still work fine, even if a CAS did not contain any 
AnalysisIds.

So it is rather a technique an application would use than a feature of the 
CASStore to support this AnalysisId. 
Is this a correct interpretation?

Best,

-- Richard

-- 
------------------------------------------------------------------- 
Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab (UKP-TUD) 
FB 20 Computer Science Department      
Technische Universität Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117
[email protected] 
www.ukp.tu-darmstadt.de 
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de
-------------------------------------------------------------------

Re: Requirements / Wish List for CAS Store?

Reply via email to