Hello from States-side Heading into Late Summer, Having spent the bulk of my career as a hard-core Smalltalk designer/developer, I have to say that I have SO ENJOYED the recent modeling-issue conversations related to Actor Appellation, etc. As some of you may know, "programming" Smalltalk is 90% modeling, 10% stating that understanding as a construct of software objects composed from the collection of objects in the Smalltalk image. So, all this recent conversation felt so much like being with my Smalltalk buddies 20+ years ago.
As to the present time, my #CitizenScience applied research involves a collaboration with the good folks at the PRImA Research Lab at the U. of Salford, Manchester UK (http://www.primaresearch.org/). In particular, I am working with PRImA on an extension of their PAGE GTS (Ground-Truth Storage) XML schema/format to extend it with an XML-based MAGAZINE format. This extended GTS format will support a shift from the bottom-up/within-page strategy for OCR page segmentation, layout- and text-recognition to a top-down/whole-issue strategy based on a #cidocCRM/#FRBRoo/#PRESSoo ontology stack. I intend to use this stack in combination with a metamodel-subgraph design pattern using the #cidocCRM as an "executable metamodel" for the soft-configuration of microservice digitization workflows. These ideas are presented in my forthcoming #DATeCH paper which is available on ResearchGate.net in preprint form: https://goo.gl/V2eR0H. INFO/EXPERIENCE/INTEREST I AM LOOKING FOR: As we move the layout- and text-recognition strategy to a top-down/whole-issue approach, we're seeking to make explicit and tractable the "unspoken gap" between initial bulk digitization -- like what the PRImA folks work on -- and the "next step" of #TDM (text- and data-mining) where the starting point is often curated text corpora such as those that the good folks at NaCTeM -- the National Centre for Text-Mining at the U. of Manchester (http://www.nactem.ac.uk/) -- work with in the medical and biological domains. To this end, we're looking to make sure that the applied work that we do between FactMiners and PRImA is "upstream useful" to the #TDM folks at NaCTeM and similar research centers. And the best common ground for this intersection of our mutual interests is #UIMA, the Unstructured Information Management for Applications standard. The #UIMA was created by IBM as part of their #CognitiveComputing (Watson) research, and is now carried on as an Apache Foundation Open Source project here: http://uima.apache.org/. TWO AREAS OF KINDRED SPIRIT INTEREST: * Does anyone have experience, citations, and/or interest in exploring the compatibility/expressiblity of the #cidocCRM types model in the #UIMA CAS type system? (For example, I am currently reading and enjoying Andreas Vlachidis' (http://goo.gl/y1GJ4p) dissertation, "Semantic Indexing via Knowledge Organization Systems: Applying the CIDOC-CRM to Archaeological Grey Literature" (http://goo.gl/s7hxyq).) * Does anyone have experience, citations, interest in exploring the potential use of the #cidocCRM as an "executable metamodel" for the soft-specification of digitization workflows within self-descriptive data-stores -- e.g. as in a subgraph of a graph representation of a text? Such #cidocCRM-compliant microservice workflows could then be implemented by #UIMA text-mining frameworks/tools. Thank you in advance, for any helpful insights or introductions you might provide. Direct replies are welcome as are on-list comments. Happy-Healthy Vibes, -: Jim :- Jim Salmons Twitter: @Jim_Salmons http://www.FactMiners.org (Our #CitizenScience project) http://www.SoftalkApple.com (Our #DigitalHistory project) http://www.medium.com/@Jim_Salmons/ (my #CognitiveComputing/#DigitalHumanities articles)
