Dear All,

Let me summarize the discussion about issue 490 between George, Christian-Emil and me, to be discussed in the next meeting:

"How to model a file" may be too vague.

There are three aspects:

A) What constructs are needed in the CRM ontologically to refer to the unique content of a file.

B) What constructs are needed to refer unambiguously to a resource that changes content. This is modeled in CRMpem as "Volatile Dataset", and will not be discussed in this issue.

C) How to connect in a knowledge base to a materialized content description.

About A):

We take a file (see also Persistent Dataset in CRMpem) in the sense of an immaterial E73 Information Object as a unique sequence of symbols that can be machine-encoded, regardless what groups of bits constitute one of the symbols of interest in this object.
   in the KB: The intended identity can be represented by a URI.

We take a file in the sense of a material copy on a digital medium as a kind of "E24 Human-Made Feature", regardless whether it is on a *local* installation, in a "*cloud*" cluster of machines, a *LOCKSS* federation of copies, or on a *removable* carrier.

    in the KB: We may refer to the material copy by an *external URL*, or create an *E52 String *in a KB or within an RDF file, or use a platform-internal  "*BLOB mechanism*" with whatever kind of identifier the platform refers to the local copy.

Ontologically, it is irrelevant for the intended immaterial content if the copy is printed or scribbled on a paper or on a digital medium (or even a Morse sound track), as long as the material form  is unambiguous wrt to the intended content. Both, paper and digital media can have errors. The CIDOC CRM v7.1 can be printed on paper and in principle be reentered manually into a file loss-free.

   in the KB: We may refer to a paper copy or a removable medium by an archival identifier.

About C)

Using an archival identifier for a paper copy, a removable digital medium or a URL for a file on a machine, in all cases the maintainers of the archive must guarantee that the identifier will be uniquely connected with the content. Otherwise, using a URL in a KB is simply inadequate. The DOI organisation forsees penalties for users that change the content of a URL associated with a DOI. There is no other solution.

DOI *automatically redirects* from the DOI URI to the guaranteed URL.

The property P190 has symbolic is used to connect a machine-encodable information object to a KB internal string. *Similarly*, we want to refer to the content of an information object via an *external* digital or not copy, via a *URL or archival identifier*. Therefore we propose the following property:

**New proposal:**

*Pxxx has representative copy*

Domain:

E90 Symbolic Object

Range:

E25 Human-Made Feature

Subproperty of:

E90 Symbolic Object. P128i is carried by (carries): E18 Physical Thing

Quantification:

many to many (0,n:0,n)

Scope note:

This property associates an instance of E90 Symbolic Object with a complete, identifying representation of its content in the form of a sufficiently readable instance of E25 Human-Made Feature, including, in particular, representations on electronic media, regardless whether they reside internally in clusters of electronic machines, such as in so-called cloud services, or on removable media.

This property only applies to instances of E73 Information Object that can completely be represented by discrete symbols, in contrast to analogue information. The representing object may be more specific than the symbolic level defining the identity condition of the represented. This depends on the type of the information object represented. For instance, if a text has type "Sequence of Modern Greek characters and punctuation marks", it may be represented in a formatted file with particular fonts on a particular machine, meaning however only the sequence of Greek letters. Any additional analogue elements contained in the representing object will not regarded to be part of the represented.

As another example, if the represented object has type "English words sequence", American English or British English spelling variants may be chosen to represent the English word "colour" without defining a different symbolic object.

In a knowledge base, typically, the represented object will appear as a URI without a corresponding file, whereas the representing one may appear by the URL of a binary encoded file existing outside the knowledge base proper, or by the archival identifier of a paper edition. A URL for identifying the copy itself in a knowledge base should only be used as long as the providers support the persistence of that copy under this URL, as it is current practice for "Linked Open Data". Associating the referred copy with a checksum in the knowledge base may help safeguarding the maintainers against unexpected change of content under this URL. If more than one representative copy is referred to, the maintainers should control their mutual consistency at the symbolic level of the object intended to be represented.

Examples:

Definition of the CIDOC Conceptual Reference Model Version 7.1.1 (E73) /has representative copy/ The content under https://cidoc-crm.org/sites/default/files/cidoc_crm_v.7.1.1_0.pdf (E25) on the sever of ICS-FORTH in Heraklion, Greece.

[The edition 7.1.1 of the CIDOC CRM is registered under the public URI "https://doi.org/10.26225/FDZH-X261";, <https://cidoc-crm.org/sites/default/files/cidoc_crm_v.7.1.1_0.pdf>which redirects users to the representative copy under https://cidoc-crm.org/sites/default/files/cidoc_crm_v.7.1.1_0.pdf. <https://cidoc-crm.org/sites/default/files/cidoc_crm_v.7.1.1_0.pdf>ICS-FORTH as organisation is responsible for the persistence of this content under the respective URL to the DOI Foundation]

-----------------------------------------------------------------------------------------------------------------------------------------------------

*Note *that the MS Word copy AND the pdf copy of the CRM is regarded to be copies of an *identical symbolic content, *the one we are interested in!
A *vocabulary of symbolic levels *is still to be defined!

IF an instance of *E73 Information Object *is referred to in a KB via *a (persistent) URL*, I would regard this as a compression of URI - Pxxx - URL. This practice would not allow for the distinction between bitwise identity or higher symbolic form.

Partners of this homework please comment if I have missed something!

Best,

Martin





-------- Forwarded Message --------
Subject:        Re: Issue 490: how to model a file [HW reminder]
Date:   Thu, 31 Aug 2023 18:35:38 +0300
From:   Martin Doerr <[email protected]>
To:     [email protected]
CC: George Bruseker <[email protected]>, Christian Emil Ore <[email protected]>



On 8/30/2023 5:17 PM, Athanasios Velios wrote:
Dear Martin,

Of course, I should have thought that the collective physical features in different machines can have one identity. It makes sense.

I think part of the revised scope note might fit better in the RDF guidelines document, but I understand how it would work.

Did you mean to copy George and CEO in your last email?
YES!😁

All the best,

Thanasis

On 29/08/2023 16:55, Martin Doerr wrote:
Dear Thanasi,

Yes, we need to adjust the scope note. My opinion about cloud services is that:

They use a finite set of dedicated machines at any time with distinct ownership. They have one master controller. There are no unregistered copies possibly lingering about. All that does not affect the materiality of the set of copies. They have an internal integrity algorithm. The Human-Made Feature is the total of tracks employed. This is fixed at any point in time algorithmically. We do not model the internals of the cloud service, nor do we have access to it. For us, ontologically, it does not matter how they work internally. They are prone to material failure altogether. Reduced likelihoods does not make it immaterial.

It reminds me the RAID machine installation at the GNM in Nuremberg. They automatically saved on two machines, and gave a reliability of 20.000 years. After 6 months, an air condition failure caused the crash of the whole system.

Would that make sense😁?

*New proposal:*

Pxxx has representative copy

Domain:

E90 Symbolic Object

Range:

E25 Human-Made Feature

Subproperty of:

E90 Symbolic Object. P128i is carried by (carries): E18 Physical Thing

Quantification:

many to many (0,n:0,n)

Scope note:

This property associates an instance of E90 Symbolic Object with a complete, identifying representation of its content in the form of a sufficiently readable instance of E25 Human-Made Feature, including, in particular, representations on electronic media, regardless whether they reside internally in clusters of electronic machines, such as in so-called cloud services, or on removable media.

This property only applies to instances of E73 Information Object that can completely be represented by discrete symbols, in contrast to analogue information. The representing object may be more specific than the symbolic level defining the identity condition of the represented. This depends on the type of the information object represented. For instance, if a text has type "Sequence of Modern Greek characters and punctuation marks", it may be represented in a formatted file with particular fonts on a particular machine, meaning however only the sequence of Greek letters. Any additional analogue elements contained in the representing object will not regarded to be part of the represented.

As another example, if the represented object has type "English words sequence", American English or British English spelling variants may be chosen to represent the English word "colour" without defining a different symbolic object.

In a knowledge base, typically, the represented object will appear as a URI without a corresponding file, whereas the representing one may appear by the URL of a binary encoded file existing outside the knowledge base proper, or by the archival identifier of a paper edition. A URL for identifying the copy itself in a knowledge base should only be used as long as the providers support the persistence of that copy under this URL, as it is current practice for "Linked Open Data". Associating the referred copy with a checksum in the knowledge base may help safeguarding the maintainers against unexpected change of content under this URL. If more than one representative copy is referred to, the maintainers should control their mutual consistency at the symbolic level of the object intended to be represented.

Examples:

Definition of the CIDOC Conceptual Reference Model Version 7.1.1 (E73) /has representative copy/ The content under https://cidoc-crm.org/sites/default/files/cidoc_crm_v.7.1.1_0.pdf (E25) on the sever of ICS-FORTH in Heraklion, Greece.

[The edition 7.1.1 of the CIDOC CRM is registered under the public URI "https://doi.org/10.26225/FDZH-X261";, <https://cidoc-crm.org/sites/default/files/cidoc_crm_v.7.1.1_0.pdf>which redirects users to the representative copy under https://cidoc-crm.org/sites/default/files/cidoc_crm_v.7.1.1_0.pdf. <https://cidoc-crm.org/sites/default/files/cidoc_crm_v.7.1.1_0.pdf>ICS-FORTH as organisation is responsible for the persistence of this content under the respective URL to the DOI Foundation]



---------------------------------------------------------------------------------------------------

Comments?

Best,

Martin


On 8/28/2023 11:48 PM, Athanasios Velios wrote:
Don't we need to revise the scope note? Also, I still do not follow the solution when it comes to files from cloud services which may serve the same file from different machines. The Human-Made Feature in one machine is different to that of another machine. Yet we are saying that they will both have the same URI?

T.



On 28/08/2023 20:45, Martin Doerr wrote:

Does that summarize where we have arrived?
Yes, I think so.

So I still propose the Pxxx has representative content, now from a /Symbolic Object to a Human-Made Feature,/ for the external reference. For instance, CIDOC CRM version 7.1 should/could have one URI, with a type still to be defined (something like word and graph-based), and both, the docx and pdf being representative contents. DOI avoids such definitions of symbolic level, and points to the pdf binary.

Best,

Martin

-- ------------------------------------
  Dr. Martin Doerr
                 Honorary Head of the
  Center for Cultural Informatics
    Information Systems Laboratory
  Institute of Computer Science
  Foundation for Research and Technology - Hellas (FORTH)
                     N.Plastira 100, Vassilika Vouton,
  GR70013 Heraklion,Crete,Greece
    Vox:+30(2810)391625
Email:[email protected] Web-site:http://www.ics.forth.gr/isl



--
------------------------------------
 Dr. Martin Doerr
               Honorary Head of the
 Center for Cultural Informatics
  Information Systems Laboratory
 Institute of Computer Science
 Foundation for Research and Technology - Hellas (FORTH)
                   N.Plastira 100, Vassilika Vouton,
 GR70013 Heraklion,Crete,Greece
  Vox:+30(2810)391625
 Email:[email protected]
 Web-site:http://www.ics.forth.gr/isl


--
------------------------------------
 Dr. Martin Doerr
Honorary Head of the
 Center for Cultural Informatics
Information Systems Laboratory
 Institute of Computer Science
 Foundation for Research and Technology - Hellas (FORTH)
N.Plastira 100, Vassilika Vouton,
 GR70013 Heraklion,Crete,Greece
Vox:+30(2810)391625 Email:[email protected] Web-site:http://www.ics.forth.gr/isl
_______________________________________________
Crm-sig mailing list
[email protected]
http://lists.ics.forth.gr/mailman/listinfo/crm-sig

Reply via email to