Dear Marin, Here is the list of some articles for persistent id. Please feel free to add more to the list.
DOI is now drafted ISO standard. http://www.doi.org/about_the_doi.html#standards IDF, DOI system and Internet Identifier specification. http://www.doi.org/factsheets/DOIIdentifierSpecs.html Paskin, N. (2006), "Naming and Meaning: Key to The Management of Intellectual Property in Digital Media" in The Europe-China Conference on Intellectual Property in Digital Media (IPDM06), Shanghai October 2006. http://www.doi.org/topics/060922IPDM_China_Paskin_preprint.pdf Paskin, N. Persistent id presentation at IDF. http://www.doi.org/doi_presentations/persistent_identifier_slides.zip Kunze, John. (2003). A. California Digital Library. Towards Electronic Persistence Using ARK Identifiers. http://www.cdlib.org/inside/diglib/ark/arkcdl.pdf Connolly, D. (2005). Untangle URIs, URLs, and URNs: Naming and the problem of persistence http://www.ibm.com/developerworks/xml/library/x-urlni.html These discuss the pros/cons of different PID schemes. Yan -----Original Message----- From: martin [mailto:[email protected]] Sent: Wednesday, December 17, 2008 2:44 AM To: Han, Yan Cc: Vadim Soshkin; crm-sig; [email protected] Subject: Re: [Crm-sig] RDFS class identifiers Dear Yan Han, Could you cite some of these papers? Yes, I mentioned aacr2 because it prescribes the typical author encoding done by libraries. We could create a URN symbolizing first that we talk about an Actor in the sense of the CRM, and then that we use an encoding scheme transliterated from an AACR2 form, as an encoding scheme reference, under the scope of CRM Actor. Just an idea. The question is for any such scheme, which has a minimal chance of creating false identitification, is, what is the cost to recover from the situation, and how. A priori, I would not exclude such a solution, as long as we have a (global) recovery mechanism to propose. Identification is a pure question of optimization: non-recognized duplicates against false false identification. Currently, using words only for access, we are swamped with false identification and non-recognized duplicates. Then, by a sudden, we cannot live with a one per million false identification? By the way, VIAF should be able to cite the precise number of such false identification. With 9 million author names, the sample is statistically relevant. Best, Martin Han, Yan wrote: > There are multiple identifier schemes existing, and I do not think that it > must be in URL/URN. There are papers discussing the pros/cons of these > schemes. > > I agree with identifiers can be encoded in any language. Using Unicode as the > underlying scheme should be able to handle this. (DOI is using Unicode to > handle this. I am in the DOI ISO group). > > Aacr2: I think it is the cataloging manual. Anglo-American Cataloguing Rules, > Second Edition > > I saw a real example that there are two different authors who have exactly > same names and were born at the same date. I remember that somewhere they > think how to address the authority file issue. > > Regarding institution-based unique identifiers, I am in a NISO working group > to address institutional identifiers. The work group is supposed to have > something ready in 2009. > > Yan Han > > > -----Original Message----- > From: [email protected] [mailto:[email protected]] On > Behalf Of Vadim Soshkin > Sent: Tuesday, December 16, 2008 3:57 PM > To: Martin Doer > Cc: crm-sig > Subject: Re: [Crm-sig] RDFS class identifiers > > I am agree that museums has to use available common identifiers as ULAN to > identify instances of data. > But I am not sure we have to encode this in URL/URN. > I see few problems in proposed schema: > > 1. Experts opinions on artist's dates (for artists outside of common > authority). > 2. Dates modifications for active artists (1945 - ...) became (1945 - 2010) > 3. Encoding artist names from not ASCII countries. > > I would prefer to have institution based unique identifiers + global > institution identifier. > > BTW What does 'aacr2' stand for? > > Vadim > > -----Original Message----- > From: [email protected] > [mailto:[email protected]]On Behalf Of martin > Sent: Tuesday, December 16, 2008 2:37 PM > To: Maximilian Schich > Cc: crm-sig > Subject: Re: [Crm-sig] RDFS class identifiers > > > I agree. The point is very simple: > > There will be a long tail of URNs anyhow. If every local database creates its > own identifier, the list will be much, much longer. > > For guys like Picasso, > referring either to VIAF or to ULAN would be currently a very sensible choice. > (viaf.org : "Picasso, Pablo, ‡d 1881-1973" or "DNB|118594206") > The likelihood of the two would be very high. That makes the world very small. > Alternatively, we could create a normalized access point "Picasso, Pablo > (1881-1973), > such as : urn:crm_actor:aacr2:picasso.pablo/1881-1973 > > Do you like it? > > I don't know, how many people have exactly the same birth and death dates and > names. > > Best, > > Martin > > Maximilian Schich wrote: >> (posted in this thread for continuity - also relevant for URI policies) >> >> Dear All, >> >> I agree with Martin: There should be a URN or something equivalent for >> Picasso in ULAN. >> >> However, we should not underestimate the long tail phenomenon: >> >> * There will be loads of URNs for some single guys (like Picasso). >> Indeed the co-reference of all those Picasso-Identifiers will be >> hard to resolve. (I would bet there will not only be a long tail >> of URN frequency, i.e. how many URNs a Person has, but even a long >> tail of normalization, i.e. in the distribution how often specific >> URNs are used for a person). >> * On the other hand there will be a huge load of people in the long >> tail without any URN in norm-data sources like ULAN (think of 'the >> guy, who did the non-art sculpture my schoolyard' or 'the guy who >> paints sheep from Naples, but isn't the guy who paints sheep form >> Naples'). >> >> As far as we know, there is no way to avoid the long tail! >> >> As a consequence, everybody has (to be able) to generate unique identifiers. >> >> Kind regards, max. >> >> >> On 16.12.2008 13:23 Uhr, martin wrote: >>> Dear All, >>> >>> To my opinion, Pablo Picasso should be represented by a URN. I'd >>> expect from the Getty >>> a proposal how to write URNs for persons identified in ULAN. See >>> discussion about URNs. >>> >>> Best, >>> >>> Martin >>> >>> Maximilian Schich wrote: >>>> I think we should encourage the owners of databases to use their >>>> existing 'database record numbers'/ /in conjunction with an >>>> identifier for their Institution as IDs for every conceivable instance. >>>> >>>> Of course for 'Pablo Picasso' we would have a number of IDs: >>>> an AKL number, another ULAN number, an ID from his city's birth >>>> registry, a record number in every private database, and probably an >>>> ID in the future all encompassing database (like for e.g. >>>> http://en.wikipedia.org/w/index.php?oldid=257931703 for >>>> http://en.wikipedia.org/wiki/Pablo_Picasso ). >>>> >>>> The String 'Pablo Picasso' is one of the worst IDs, as there might be >>>> multiple language versions and different name formats. For e.g. in >>>> the ISI Web of Science the (ambiguous) ID would be 'P Picasso'; many >>>> people simply call him 'Picasso'; and his birth name is 'Pablo Diego >>>> José Francisco de Paula Juan Nepomuceno María de los Remedios >>>> Cipriano de la Santísima Trinidad Martyr Patricio Clito Ruíz y >>>> Picasso' - (not a joke!). >>>> >>>> How to normalize the IDs is another question. As real data usually >>>> comes in long tails, norm data is of limited help. >>>> >>>> Best wishes, max. >>>> >>>> Dr. des. Maximilian Schich M.A. >>>> adr.: Westendstrasse 80 | D-80339 München | Germany >>>> tel.: +49-179-6678041 | skype: maximilian.schich >>>> mail: [email protected] | home: www.schich.info >>>> >>>> CONFIDENTIALITY NOTICE: This e-mail message including attachments, if >>>> any, is intended only for the person or entity to which it is addressed >>>> and may contain confidential and/or privileged material. Any >>>> unauthorized review, use, disclosure or distribution is prohibited. If >>>> you are not the intended recipient, please contact the sender by reply >>>> e-mail and destroy all copies of the original message. Thank you. >>>> >>>> >>>> On 15.12.2008 16:20 Uhr, Vadim Soshkin wrote: >>>>> I am agree with approach of moving English terms from class and >>>>> property identifiers to rdf:label. >>>>> Why user's instance identifiers are different? What identifier are >>>>> you are proposing for 'Pablo Picasso'? >>>>> >>>>> Best regards >>>>> >>>>> Vadim >>>>> -----Original Message----- >>>>> *From:* [email protected] >>>>> [mailto:[email protected]]*On Behalf Of *Maximilian >>>>> Schich >>>>> *Sent:* Saturday, December 13, 2008 6:05 AM >>>>> *To:* [email protected] >>>>> *Cc:* 'crm-sig' >>>>> *Subject:* Re: [Crm-sig] RDFS class identifiers >>>>> >>>>> "I want the version that has the class (E) or property (P) >>>>> number plus the text in the label and just the class (E) or property >>>>> (P) number in the ID." >>>>> >>>>> me too! This clarifies that the node with the ID 'E21' indeed >>>>> represents a CIDOC-CRM concept like 'E21_Person' and not the word >>>>> 'Person'. However we should clarifiy to the users, that they >>>>> should not use a similar strategy in their rdf instances: The >>>>> person 'Pablo Picasso' should not have an ID like '1495r3' and a >>>>> label/appelation like '1495r3_Pablo_Picasso'. This seems logical >>>>> from our point of view, but users may be tempted to do so. >>>>> >>>>> Can't we leave out * and #...? >>>>> >>>>> Kind regards, >>>>> max. >>>>> >>>>> Dr. des. Maximilian Schich M.A. >>>>> adr.: Westendstrasse 80 | D-80339 München | Germany >>>>> tel.: +49-179-6678041 | skype: maximilian.schich >>>>> mail: [email protected] | home: www.schich.info >>>>> >>>>> CONFIDENTIALITY NOTICE: This e-mail message including >>>>> attachments, if >>>>> any, is intended only for the person or entity to which it is >>>>> addressed >>>>> and may contain confidential and/or privileged material. Any >>>>> unauthorized review, use, disclosure or distribution is >>>>> prohibited. If >>>>> you are not the intended recipient, please contact the sender by >>>>> reply >>>>> e-mail and destroy all copies of the original message. Thank you. >>>>> >>>>> >>>>> On 13.12.2008 8:32 Uhr, Stephen Stead wrote: >>>>>> I want the version that has the class (E) or property (P) >>>>>> number plus the text in the label and just the class (E) or >>>>>> property (P) number in the ID. >>>>>> Rgds >>>>>> SdS >>>>>> >>>>>> Stephen Stead >>>>>> Tel +44 20 8668 3075 Mob +44 7802 755 013 >>>>>> E-mail [email protected] >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: [email protected] >>>>>> [mailto:[email protected]] On Behalf Of Vladimir Ivanov >>>>>> Sent: 13 December 2008 07:15 >>>>>> To: martin >>>>>> Cc: crm-sig >>>>>> Subject: Re: [Crm-sig] RDFS class identifiers >>>>>> >>>>>> Dear all, >>>>>> >>>>>> I agree with Nick. >>>>>> This approach realises the statement that >>>>>> CRM is not about (Entity and Proprty) names >>>>>> but about (common, language independent) concepts. >>>>>> >>>>>> It also helps to manage multilingual version of the CRM when >>>>>> we have EXX in scope notes and can extend it with "full name" >>>>>> in a certain language. >>>>>> >>>>>> Example: >>>>>> >>>>>> <rdfs:Class rdf:ID="E21_"> >>>>>> <rdfs:label xml:lang="en">Person</rdfs:label> >>>>>> <rdfs:comment xml:lang="en">[Engish text]... E21_ [Engish >>>>>> text].......</rdfs:comment>. >>>>>> ... >>>>>> <rdfs:label xml:lang="ru">????????</rdfs:label> >>>>>> <rdfs:comment xml:lang="ru">[Russian text]... E21_ [Russian >>>>>> text]...</rdfs:comment>. >>>>>> ---------------- >>>>>> >>>>>> But natural language descriptions with codes and names are >>>>>> simplier >>>>>> than descriptions with codes only! >>>>>> >>>>>> Dear Martin, >>>>>> I'am afraid that "stars" (or any other symbol) in >>>>>> xml atributes may lead to some problems: >>>>>> >>>>>> 1. <rdfs:label xml:lang="*en*"> >>>>>> Some systems do not recognize *en* as English (en). >>>>>> >>>>>> 2. <rdfs:subClassOf rdf:resource="*#E21*" /> >>>>>> and <rdfs:Class rdf:ID="*E21*"> >>>>>> refer to different entities . >>>>>> >>>>>> Maybe, we should write <rdfs:subClassOf rdf:resource="#*E21*" /> ? >>>>>> >>>>>> Best regards, >>>>>> Vladimir >>>>>> >>>>>> 2008/12/12 martin <[email protected]>: >>>>>>> Dear Nick, >>>>>>> >>>>>>> I support this proposal as issue. >>>>>>> >>>>>>> I'd prefer however this form: >>>>>>> >>>>>>> <rdfs:Class rdf:ID="*E21*"> >>>>>>> * * <rdfs:label xml:lang="*en*">*E21 Person*</rdfs:label> >>>>>>> * * <rdfs:label xml:lang="*fr*">*E21 Personne*</rdfs:label> >>>>>>> * * <rdfs:label xml:lang="*gr*">*E21 ???s?p?*</rdfs:label> >>>>>>> * * <rdfs:subClassOf rdf:resource="*#E20*" /> >>>>>>> * * <rdfs:subClassOf rdf:resource="*#E39*" /> >>>>>>> </rdfs:Class> >>>>>>> >>>>>>> Opinions? >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Martin >>>>>>> >>>>>>> Nicholas Crofts wrote: >>>>>>>> Dear all, >>>>>>>> >>>>>>>> I've been doing some work recently using the CRM rdfs. >>>>>>>> http://cidoc.ics.forth.gr/rdfs/cidoc_v4.2.rdfs >>>>>>>> >>>>>>>> The naming convention adopted for the class and property >>>>>>>> identifiers >>>>>>>> strikes me as inconvenient in some respects. >>>>>>>> Currently, the names used for the class and property >>>>>>>> identifiers contain >>>>>>>> both the CRM code and the English label. >>>>>>>> >>>>>>>> 1. If the labels get changed at any time in the future, the >>>>>>>> identifiers >>>>>>>> are broken >>>>>>>> 2. Non English speakers are put at a disadvantage >>>>>>>> 3. The rdf syntax is more verbose than necessary ... this may >>>>>>>> sound >>>>>>>> trivial but that overhead can be huge when migrating large >>>>>>>> datasets. >>>>>>>> 4. The names have been mangled with underscores to make them >>>>>>>> respect >>>>>>>> xml/rdf syntax. >>>>>>>> >>>>>>>> I would suggest using just the codes (i.e. E1, P2, etc.) as >>>>>>>> class >>>>>>>> identifiers and including the names (in various languages) as >>>>>>>> rdf:labels. >>>>>>>> >>>>>>>> The result would like something like this: >>>>>>>> >>>>>>>> <rdfs:Class rdf:ID="*E21*"> >>>>>>>> * * <rdfs:label xml:lang="*en*">*Person*</rdfs:label> >>>>>>>> * * <rdfs:label xml:lang="*fr*">*Personne*</rdfs:label> >>>>>>>> * * <rdfs:label xml:lang="*gr*">*???s?p?*</rdfs:label> >>>>>>>> * * <rdfs:subClassOf rdf:resource="*#E20*" /> >>>>>>>> * * <rdfs:subClassOf rdf:resource="*#E39*" /> >>>>>>>> </rdfs:Class> >>>>>>>> >>>>>>>> Rather than this: >>>>>>>> >>>>>>>> >>>>>>>> <rdfs:Class rdf:ID="*E21.Person*"> >>>>>>>> * * <rdfs:subClassOf rdf:resource="*#E20.Biological_Object*" /> >>>>>>>> * * <rdfs:subClassOf rdf:resource="*#E39.Actor*" /> >>>>>>>> </rdfs:Class> >>>>>>>> >>>>>>>> (NB I've removed the rdfs:comments for clarity) >>>>>>>> >>>>>>>> It would be nice, of course, to be able to have both forms >>>>>>>> and define >>>>>>>> equivalence relationships between them. >>>>>>>> This could perhaps be done with the rdfs:isDefinedBy >>>>>>>> property? but I'm >>>>>>>> not sure that it's meant for this. >>>>>>>> >>>>>>>> Best wishes >>>>>>>> >>>>>>>> Nick Crofts >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ------------------------------------------------------------------------ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Crm-sig mailing list >>>>>>>> [email protected] >>>>>>>> http://lists.ics.forth.gr/mailman/listinfo/crm-sig >>>>>>> -- >>>>>>> >>>>>>> -------------------------------------------------------------- >>>>>>> Dr. Martin Doerr | Vox:+30(2810)391625 | >>>>>>> Principle Researcher | Fax:+30(2810)391638 | >>>>>>> | Email: [email protected] | >>>>>>> | >>>>>>> Center for Cultural Informatics | >>>>>>> Information Systems Laboratory | >>>>>>> Institute of Computer Science | >>>>>>> Foundation for Research and Technology - Hellas (FORTH) | >>>>>>> | >>>>>>> Vassilika Vouton,P.O.Box1385,GR71110 Heraklion,Crete,Greece | >>>>>>> | >>>>>>> Web-site: http://www.ics.forth.gr/isl | >>>>>>> -------------------------------------------------------------- >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Crm-sig mailing list >>>>>>> [email protected] >>>>>>> http://lists.ics.forth.gr/mailman/listinfo/crm-sig >>>>>>> >>>>>> _______________________________________________ >>>>>> Crm-sig mailing list >>>>>> [email protected] >>>>>> http://lists.ics.forth.gr/mailman/listinfo/crm-sig >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Crm-sig mailing list >>>>>> [email protected] >>>>>> http://lists.ics.forth.gr/mailman/listinfo/crm-sig >>>> ------------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> Crm-sig mailing list >>>> [email protected] >>>> http://lists.ics.forth.gr/mailman/listinfo/crm-sig >> Dr. des. Maximilian Schich M.A. >> adr.: Westendstrasse 80 | D-80339 München | Germany >> tel.: +49-179-6678041 | skype: maximilian.schich >> mail: [email protected] | home: www.schich.info >> >> CONFIDENTIALITY NOTICE: This e-mail message including attachments, if >> any, is intended only for the person or entity to which it is addressed >> and may contain confidential and/or privileged material. Any >> unauthorized review, use, disclosure or distribution is prohibited. If >> you are not the intended recipient, please contact the sender by reply >> e-mail and destroy all copies of the original message. Thank you. >> >> _______________________________________________ >> Crm-sig mailing list >> [email protected] >> http://lists.ics.forth.gr/mailman/listinfo/crm-sig >> >> > > -- -------------------------------------------------------------- Dr. Martin Doerr | Vox:+30(2810)391625 | Principle Researcher | Fax:+30(2810)391638 | | Email: [email protected] | | Center for Cultural Informatics | Information Systems Laboratory | Institute of Computer Science | Foundation for Research and Technology - Hellas (FORTH) | | Vassilika Vouton,P.O.Box1385,GR71110 Heraklion,Crete,Greece | | Web-site: http://www.ics.forth.gr/isl | --------------------------------------------------------------
