Re: [Crm-sig] ISSUE Label-free RDF classes PLEASE VOTE

Christian-Emil Smith Ore Tue, 24 Jul 2018 14:21:51 +0300



Dear all,


The debate following Martin's proposal idemonstrates the problem connected to 
PIDs and URI. When the idea of LOD/Semantic Web was introduced one seems to 
have had an idea that there would be one single universal identifier for an 
item. In the last 18 years it has clearly been demonstrated that this is not 
possible in general.  One will always need synonym mechanisms.  One of the more 
successful identifier schemas are ISBN-numbers which is a pure numeric 
standard.  Another small standard is the paper sheet sizes.



 In systematic biology (at least in botany) one use Latin names supplied with a 
complex but precise way to formulate name changes including the name of the 
responsible. For most non-native Latin speakers these Latin/Graeco names 
function as labels with no internal meaning.  What is the meaning of “Rubus 
chamaemorus” and if you decipher it, will you know what it actually denotes? 
Still the system works well.



For non-native English speakers English names as the labels in the CRM, the 
element names in TEI or the terms used in IIIF function as similar labels. We 
do not bother very much about the English nuances. They could as well have been 
in Latin.  With all respect, this observation seems to be surprising for many 
native English (or for any native speakers of a language of some standard).



In my opinion the numeric labels should be the authoritative ones and the more 
verbose in some native language (English, French, Norwegian, Chinese) are 
synonyms. A language neutral standard is the best. So if the voting is still 
open I will vote yes.







Dear all,
The debate following Martin's proposal idemonstrates the problem connected to 
PIDs and URI. When the idea of LOD/Semantic Web was introduced one seems to 
have had an idea that there would be one single universal identifier for an 
item. In the last 18 years it has clearly been demonstrated that this is not 
possible in general.  One will always need synonym mechanisms.  One of the more 
successful identifier schemas are ISBN-numbers which is a pure numeric 
standard.  Another small standard is the paper sheet sizes.

 In systematic biology (at least in botany) one use Latin names supplied with a 
complex but precise way to formulate name changes including the name of the 
responsible. For most non-native Latin speakers these Latin/Graeco names 
function as labels with no internal meaning.  What is the meaning of “Rubus 
chamaemorus” and if you decipher it, will you know what it actually denotes? 
Still the system works well.

For non-native English speakers English names as the labels in the CRM, the 
element names in TEI or the terms used in IIIF function as similar labels. We 
do not bother very much about the English nuances. They could as well have been 
in Latin.  With all respect, this observation seems to be surprising for many 
native English (or for any native speakers of a language of some standard).

In my opinion the numeric labels should be the authoritative ones and the more 
verbose in some native language (English, French, Norwegian, Chinese) are 
synonyms. A language neutral standard is the best.

So if the voting is still open I will vote yes.

Best,
Christian-Emil






________________________________
From: Crm-sig <[email protected]> on behalf of George Bruseker 
<[email protected]>
Sent: 24 July 2018 11:40
To: Francesco Beretta
Cc: [email protected]
Subject: Re: [Crm-sig] ISSUE Label-free RDF classes PLEASE VOTE

Dear all,

I do not want to be obstructionist to progress on a pragmatic issue but I feel 
that we should pause the vote process.

It seems to be that both sides have very good points and we need to find a 
means to reconcile these as much as possible. Let me try to abbreviate the main 
aspects of the points made so far:

As Rob points out, with the labels embedded in the class and property names, we 
have a readable RDF and a single RDF of reference. These are fundamental 
attributes we should be looking to support.

With a label-less entity/property version, we would have greater neutrality 
from labels which is a strength with regards to robustness against label update 
and creates linguistic neutrality. On the other hand it means having two 
versions of CRM around leading to potential interoperability problems and extra 
overhead. It also makes the plain RDF unreadable in any sense except to the 
versed few.

Melanie’s point of the cost of change resulting from updates to the standard is 
a very fundamental argument and I think a big aspect we have to bear in mind, 
with which I believe Rob would concur. If changes to labels cost user 
communities significant time and money, this is a big problem to CRM 
sustainability. That being said, if the SIG has historically been conservative 
about label changes, perhaps it is not as big an issue as we think.

Before we move forward with creating a version like this:

I suggest that we need to check how many times we have changed class or 
property names in the past to see how big an issue this is. From the point of 
update robustness/cost to users/community (though not linguistic flexibility) 
this is the major issue.

If we decide to make such a version, I would think we would want to ensure that 
we have the correct mechanisms in place for ensuring the management of the 
versions and the resolution and persistence of the URIs as per Richard and 
Francesco’s suggestion. Along those lines, I believe Francesco’s comments on 
the URI service and group ontology development are fruitful. Indeed, joining 
them to Rob’s extended questions about previous version names etc., much could 
be addressed through a robust service for URI resolution of CRM entities and 
properties.

Feel free to disagree with this summary if I have missed or misinterpreted your 
points. I think we all share the same aim of robust interoperability on the 
data level and just have to find the right balance. I would invite that we 
check the label changes and discuss how to robustly support CRM URI resolution 
before proceeding to creating new RDF versions.

Best,

George


On Jul 23, 2018, at 8:11 PM, Francesco Beretta 
<[email protected]<mailto:[email protected]>> 
wrote:


Dear all,

I also vote YES.

Furthermore, I'd also like to stress the importance to distinguish between the 
identifier, which must be stable during the whole life of a class, or property, 
and the label(s), which can be multiple, multilingual and evolve, as everyone 
knows.

The meaning of the class or property, as it was already stressed on this list, 
is provided by the scope note and, in fact, by the scope note AND the version 
(or namespace) of CRM. Strictly speaking it's always about a class in a 
specific CRM version: "This is the scope note of E59 Primitive Value of the 
CIDOC CRM version 6." (cf. note 7 in the document under discussion).


Labels are often confusing. Therefore, it is not in my opinion just for 
"convenience of implementation" (as the new document states) that the RDF 
serialisation should define "number-only classes and properties" but it is 
something fundamental. Therefore, in my opinion, the alphanumeric form E7 
should be the preferred one in the URI, and of cours the URI with labels, 
insofar as used in earlier versions, maintained as condition sine aqua non of 
interoperability.

At the same time, the human should be always provided with an easy way of 
retrieving the label(s) for his/her convenience. This is not to be provided, in 
my opinion, by the RDF serialisation as a static file (which will of course 
contain the labels) but by a dereferencing service implemented as a web service 
where you can send the URI of the class, or property, and receive a web-page 
for the human  to read, like this http://ontologies.dataforhistory.org/class/7 
but devoted to the whole CIDOC CRM community and dereferencing the specified 
identifiers, like the Agent<http://dbpedia.org/ontology/Agent> class in the 
DBPedia ontology.

The dereferencing page by DBPedia of Agent shows, in my opinion, the limits of 
an identification for the class provided by the label: the label in the URI 
will remain forever even if a better one is found for the class, problems could 
be raised with disambiguation (at list in the human mind, not by the machine), 
etc. On this same page<http://dbpedia.org/ontology/Agent>, the property 
owl:equivalentClass shows the solution by Wikidata mentioned by Melanie, which 
is evidently more robust: https://www.wikidata.org/wiki/Q24229398. Of course 
this solution, like the 
"http://www.cidoc-crm.org/cidoc-crm/E7";<http://www.cidoc-crm.org/cidoc-crm/E7> 
URI form needs double dereferencing, for the human and for the manchine in form 
of a data stream.

Therefore, in my opinion, in the context of semantic web the issue of the 
ongoing discussion is much more about having a URIs dereferencing service then 
adding labels to URIs specifications in static documents. REF documents are 
useful for collective memory and experts, but in every day life web services 
are more effective and useful: just write the URI, and you'll get in tenths of 
a second the answer.

In this same context, the CRM version's number should also be always provided 
in the URI e.g. http://www.cidoc-crm.org/cidoc-crm/6.2/E7 because the scope 
note and labels depend on the version, they are not absolute in the whole class 
(or property) history, and a URL redirection could lead easily to the page 
"http://www.cidoc-crm.org/Entity/E7-Activity/Version-6.2.1";, providing at the 
same time HTML for me to read and RDF data (in XML, json or whatelse) for 
consumption by the machine.

The same principle sould be applied to CRM extensions, e.g. 
http://www.cidoc-crm.org/crm-geo/1.2/SP2.

In my opinion, this point sould be treated as a part of the discussion we 
started in the last SIG in Lyon about improving CRM versions and extensions 
management, and we should find in future a more dynamic, web based way of 
managing versions and dereferencing. And discussions... ;-)

All the best

Francesco






Le 23.07.18 à 18:02, Detlev Balzer a écrit :

I also vote YES.

Where natural-language names are desired for readability, why not allow for any 
number of non-normative label sets? This would put Chinese or Armenian class 
and property names on a par with the English ones, without compromising 
interoperability as long as the distinction between URI and label is kept in 
mind.

Best,
Detlev

Am 19.07.2018 um 18:39 schrieb Martin Doerr:


Dear All,

The current text "Expressing the CIDOC Conceptual Reference Model in RDF" 
(https://docs.google.com/document/d/1zCGZ4iBzekcEYo4Dy0hI8CrZ7dTkMD2rJaxavtEOET0/edit?ts=5b50b922)

contains the phrase:
"In addition, for convenience of implementation we have defined number-only 
classes and properties e.g. “E63” or “P2”, and declared each of them to be 
equivalent to the corresponding full form"

In the past, this option was provided and widely rejected by users. I do not 
know of any installation using it.

It was proposed again because CRM-SIG reserves the right to change labels 
without changing the code ("E63", "P2" etc.), in cases when the meaning is 
preserved but the existing label causes confusion and can be replaced by a more 
fitting or at least less confusing one. These changes are very rare, and 
explicit in the amendment of the respective version.

Those of you who support:
"In addition, for convenience of implementation we have defined number-only 
classes and properties e.g. “E63” or “P2”, and declared each of them to be 
equivalent to the corresponding full form"
 please vote *YES.

*Those of you who support:
"The English label is part of the definition of the RDF classes and properties. 
Number-only classes and properties e.g. “E63” or “P2”, are not provided". Other 
means of supporting migration between label versions to be discussed.

please vote *NO.

*(Those who believe the issue is not sufficiently formulated please vote 
"VETO". One or more "VETO" will stop the e-mail vote as a whole and postpone it 
to the next physical meeting)

Best

Martin

--
--------------------------------------------------------------
 Dr. Martin Doerr              |  Vox:+30(2810)391625        |
 Research Director             |  Fax:+30(2810)391638        |
                               |  Email: 
[email protected]<mailto:[email protected]> |
                                                             |
               Center for Cultural Informatics               |
               Information Systems Laboratory                |
                Institute of Computer Science                |
   Foundation for Research and Technology - Hellas (FORTH)   |
                                                             |
               N.Plastira 100, Vassilika Vouton,             |
                GR70013 Heraklion,Crete,Greece               |
                                                             |
             Web-site: http://www.ics.forth.gr/isl           |
--------------------------------------------------------------



_______________________________________________
Crm-sig mailing list
[email protected]<mailto:[email protected]>
http://lists.ics.forth.gr/mailman/listinfo/crm-sig



_______________________________________________
Crm-sig mailing list
[email protected]<mailto:[email protected]>
http://lists.ics.forth.gr/mailman/listinfo/crm-sig

Re: [Crm-sig] ISSUE Label-free RDF classes PLEASE VOTE

Reply via email to