interest in intersection of #cidocCRM, #PRESSoo, & #UIMA, etc.

Jim Salmons Wed, 24 Aug 2016 22:01:53 +0300

Hello from States-side Heading into Late Summer,

Having spent the bulk of my career as a hard-core Smalltalk
designer/developer, I have to say that I have SO ENJOYED the recent
modeling-issue conversations related to Actor Appellation, etc. As some of
you may know, "programming" Smalltalk is 90% modeling, 10% stating that
understanding as a construct of software objects composed from the
collection of objects in the Smalltalk image. So, all this recent
conversation felt so much like being with my Smalltalk buddies 20+ years
ago.


As to the present time, my #CitizenScience applied research involves a
collaboration with the good folks at the PRImA Research Lab at the U. of
Salford, Manchester UK (http://www.primaresearch.org/). In particular, I am
working with PRImA on an extension of their PAGE GTS (Ground-Truth Storage)
XML schema/format to extend it with an XML-based MAGAZINE format. This
extended GTS format will support a shift from the bottom-up/within-page
strategy for OCR page segmentation, layout- and text-recognition to a
top-down/whole-issue strategy based on a #cidocCRM/#FRBRoo/#PRESSoo ontology
stack. I intend to use this stack in combination with a metamodel-subgraph
design pattern using the #cidocCRM as an "executable metamodel" for the
soft-configuration of microservice digitization workflows. These ideas are
presented in my forthcoming #DATeCH paper which is available on
ResearchGate.net in preprint form: https://goo.gl/V2eR0H.

INFO/EXPERIENCE/INTEREST I AM LOOKING FOR: As we move the layout- and
text-recognition strategy to a top-down/whole-issue approach, we're seeking
to make explicit and tractable the "unspoken gap" between initial bulk
digitization -- like what the PRImA folks work on -- and the "next step" of
#TDM (text- and data-mining) where the starting point is often curated text
corpora such as those that the good folks at NaCTeM -- the National Centre
for Text-Mining at the U. of Manchester (http://www.nactem.ac.uk/) -- work
with in the medical and biological domains. To this end, we're looking to
make sure that the applied work that we do between FactMiners and PRImA is
"upstream useful" to the #TDM folks at NaCTeM and similar research centers.
And the best common ground for this intersection of our mutual interests is
#UIMA, the Unstructured Information Management for Applications standard.
The #UIMA was created by IBM as part of their #CognitiveComputing (Watson)
research, and is now carried on as an Apache Foundation Open Source project
here: http://uima.apache.org/.

TWO AREAS OF KINDRED SPIRIT INTEREST:

  *  Does anyone have experience, citations, and/or interest in exploring
the compatibility/expressiblity of the #cidocCRM types model in the #UIMA
CAS type system? (For example, I am currently reading and enjoying Andreas
Vlachidis' (http://goo.gl/y1GJ4p) dissertation, "Semantic Indexing via
Knowledge Organization Systems: Applying the CIDOC-CRM to Archaeological
Grey Literature" (http://goo.gl/s7hxyq).) 

  *  Does anyone have experience, citations, interest in exploring the
potential use of the #cidocCRM as an "executable metamodel" for the
soft-specification of digitization workflows within self-descriptive
data-stores -- e.g. as in a subgraph of a graph representation of a text?
Such #cidocCRM-compliant microservice workflows could then be implemented by
#UIMA text-mining frameworks/tools.

Thank you in advance, for any helpful insights or introductions you might
provide. Direct replies are welcome as are on-list comments.

    Happy-Healthy Vibes,
    -: Jim :-

    Jim Salmons
    Twitter: @Jim_Salmons
    http://www.FactMiners.org (Our #CitizenScience project)
    http://www.SoftalkApple.com (Our #DigitalHistory project)
    http://www.medium.com/@Jim_Salmons/ (my
#CognitiveComputing/#DigitalHumanities articles)

[Crm-sig] Info/resources/interest in intersection of #cidocCRM, #PRESSoo, & #UIMA, etc.

Reply via email to