Dear George,
You are right, this is an open and important question.
I have repeatedly pointed to that issue in CRM-SIG Meetings, but with
limited response so far.
One part of the discussion had been in CRMInf, about the equivalence of
a file with a Proposition Set.
The other part has been when introducing P190 has symbolic content. My
remark is not even in the minutes, that this must be discussed. May be I
forgot a homework to elaborate this more. Let me expand here a bit my
current understanding:
This: E41 Appellation p190 has symbolic content df:literal "file name
value goes here" feeds only the name of the file, which identifies the
file, into the Appellation. It could be just an rdf label, because the
content of the Appellation is hardly ambiguous. On the other side,
reserving another node for the Appellation allows for assigning a type
"filename" to it. But a filename is anyhow not a good identifier.
If the Digital Object is represented by a URI, e.g., a DOI, the
remaining question is, if it resolves or can unambiguously be related to
an external content or not.
If it does, then the identity of this Digital Object should be the
"primitive" one, its binary identity. I.e., a .pdf and .doc of the same
scientific publication would be different objects, even a .doc with
changes in embedded metadata would make it different.
If we mean however that the ontological identity is, for instance, that
of the equivalence class of possible encodings of one certain
publication following Springer rules or so, the URI pointing to a binary
is misleading, because many files can represent the same publication.
The different encodings will both /incorporate/ and /represent /the
respective publication, but both properties are not identifying the content.
Therefore, a variation (not subproperty) of P190 should do it. We have
again the problem, that we need to form a common superclass with a
Primitive Value.
Perhaps, once we have done the great step and declared some Primitive
Values as IsA Appellations, the most elegant form would be to form a
superclass of E62 String and Digital Object, and raise the range of P190
to it. This would elegantly make clear that E62 String and Digital
Object differ only in the fact if they are in or out of the KB proper.
If we do that, the range of P190 will again point to a URI, which, in
this case, either must be the binary, or a lower representation than the
level of symbolic specificity given for the domain instance. In any
case, we should reach at a "tangible" binary, and a suitable type to
distinguish, if the URI is meant to correspond to a real binary (even if
no more extent!!), or to a higher level may be useful.
We should also answer the question, how this translates to analogue
content, because we may copy files manually and re-encode.
After that, we should think about Propositional Objects represented in
files...
Any thoughts?
Best,
Martin
On 4/15/2020 8:16 PM, George Bruseker wrote:
Dear all,
Here is another humble modelling problem for which I don't feel that
there is a commonly agreed and documented answer, although it is a
common question. How do we connect an actual file with the semantic
network? So here is the scenario.
I have a file: a word doc, a jpg image, a powerpoint. I want to
represent it in CIDOC CRM and connect it the semantic network and do
so in a way that would be interoperable with all other well formed
instances of CIDOC CRM. How do I do that?
Well part of the answer is clear. Part is unclear. Regarding the
representation of the the fact that there is a digital object we have
two choices. If we use pure CRMbase then we have
E73 p2 has type E55 "Digital Object"
If we use CRM extensions then we have
D1 Digital Object
Great. Now in the semantic network we can relate this in all sorts of
standard ways to other entities (p67 refers to, p128 is about) etc.
etc. We can use a creation event from CRM base or a digital machine
event from CRMdig to document when the file was created, by whom etc.
Super. I can use p1 is identified by E41 appellation to indicate the
name of that digital object (which may differ from the file name) and
give it a type with p2 has type. All standard and wonderful.
I still have to put the file itself, that actual digital object which
I want my user to be able to find and manipulate somehow in relation
to the semantic network.
How do people tend to do that? I have seen many variation but no
common method.
So what is the go-to solution and should it perhaps be documented on
the CIDOC CRM site because it is a really common pattern?
I have seen
the file = E73... just put the file as the URN of the semantic node.
But then this means your file is accessible via a URN which is often
not the case and anyhow you probably want to distinguish your semantic
node which 'stands for' the file from the actual file itself.
I have seen and used E41 Appellation as a pattern. So the D1 or E73 p1
is identified by E41 Appellation p190 has symbolic content df:literal
"file name value goes here". Here you have a problem that you then
need also to store somehow a path by which to reach that on some file
system.
I guess another alternative would be to use p190 has symbolic content
and then throw the file in there as a blob. I don't particularly like
this solution, as I would hope to find strings at the end of p190 and
not blobs.
Would maybe a sub property of p190 'is encoded in file' be an option
in order to use the blob solution?
Anyhow maybe there are already better solutions than I lay out above,
but I would be interested to hear. Also I think it would be great to
identify the best practice and put in on the main site so that people
follow this strategy consistently.
Probably my examples hide multiple use cases requiring different
patterns. Anyhow, what do you think?
Best,
George
_______________________________________________
Crm-sig mailing list
[email protected]
http://lists.ics.forth.gr/mailman/listinfo/crm-sig
--
------------------------------------
Dr. Martin Doerr
Honorary Head of the
Center for Cultural Informatics
Information Systems Laboratory
Institute of Computer Science
Foundation for Research and Technology - Hellas (FORTH)
N.Plastira 100, Vassilika Vouton,
GR70013 Heraklion,Crete,Greece
Vox:+30(2810)391625
Email: [email protected]
Web-site: http://www.ics.forth.gr/isl
_______________________________________________
Crm-sig mailing list
[email protected]
http://lists.ics.forth.gr/mailman/listinfo/crm-sig