Am 09.01.2013 um 21:55 schrieb Neal R Lewis <[email protected]>: > Uses: > Maintain independence of annotations with respect to independent pipelines > Archival Storage > Temporary storage between idependent pipelines > Modify Type System > Contain Meta Data about CAS > Perform fine or coarse grained CRUD operations on a CAS > > > Functionality: > Insert / Delete complete CAS(es) > Insert / Delete fragments of CAS(es) (individual SOFAs or FSes > [annotations]) > Assemble CAS with SOFA and all / some / none feature structures > - This would help reduce the size of CAS to its necessary components > before passing it to independent pipelines. It would also require the > construction of valid CASes for use in Analytic Engines, complete with valid > Views > > Update CASes within the store (i.e, inserting annotations): > - This would allow for adding deltas from AEs > > > Some of these might seem redundant, but I hope they give a general overview. > Does this seem to summarize it well ?
One of the requirements I had on my list was a rather technical requirement and I miss it in your summary: The CAS storage should be embeddable in an application. Having to deploy an extra server causes additional effort/pain when rolling out an application, in particular web applications that ship as WARs. It would also facilitate shipping the storage as an Eclipse plugin at some point in the future. For me, talking to the storage via direct native method invocation is also preferrable to talking to it via any kind of remote interface or via any kind of query language that ends up being encoded as Strings in Java. Adding on that, it would be good being able to run multiple storages at the same time with their data kept in different directories on the file system. Consider running a parameter sweeping experiment in which every run of an experiment gets his own storage saved in an output folder for that run along with other data generated by that run. To archive the run, I can just back up that folder. Both of these aspects are influenced by the way that Apache Lucene and HSQLDB can be used. Is it a requirement for you that the storage must be able to run as a server that you talk to via network? Cheers, -- Richard -- ------------------------------------------------------------------- Richard Eckart de Castilho Technical Lead Ubiquitous Knowledge Processing Lab (UKP-TUD) FB 20 Computer Science Department Technische Universität Darmstadt Hochschulstr. 10, D-64289 Darmstadt, Germany phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117 [email protected] www.ukp.tu-darmstadt.de Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de -------------------------------------------------------------------
