Hello,

I am forwarding this email to several other people whom I think the
topic of Authority Controls and Author Name Registries may be of
interest.  I would like to see if they have some interesting feedback
here so that the architectural design behind such an activity has the
correct direction.

Firstly an introduction for those not previously involved with this
thread. This thread discuss' ideas about how to add associations
between Authors and Organizations into the metadata of DSpace items,
and a technical discussion about extending the DSpace sql
metadatavalue table to add rows that act like relationships between
the metadata fields.  I am somewhat concerned about the suggestion and
wish to forward an alternative approach and the recommendation that
such a development direction for DSpace needs considerably greater
guidance by the community.  It is a paramount importance that we bring
together a set of guidelines for how we should and should not approach
the encoding of authorities and external relationships in DSpace item
metadata.  that we need such a guideline to assure the direction of
DSpace development is informed by those strategies and approaches that
are emerging in the larger Digital Library, Archiving and Digital
Publication communities that utilize DSpace as a tool.

This said, I highly caution about extending or repurposing the fields
in the metadatavalue table.  Firstly, the additions that Larry Stone
made were to qualitatively describe the action of attaching the
authority controlled value to the Item metadata field, not in
describing what is a very separate relationship that exists outside
the item. The "confidence" field is specific to that assignment, so is
the label/id, as such all are captured in the record attached to the
Item. However, the relationship between an organization and an author
is not specific to either attributes assignment to the item.  There
will be many items published by an author associated with an
organization, and thus such entities and their relationships exist
independent of the item itself.  As such, trying to encode the
relationship between the author and their organization in the Item
itself is incorrect, it should be maintained externally.

In 2.0 where we will have an Entity driven model, Authors and
Organizations would be first class objects themselves, with attributes
and relationships.  And these would be maintained separately from
assignment itself. In DSpace 1.x, I would also only recommend
customizing DSpace to support this, if and only if you cannot find a
decent name control authority to support this capability.  Which
brings to mind the OCLC  LC Name Authority

http://www.oclc.org/research/activities/past/orprojects/authority/default.htm
http://www.oclc.org/support/documentation/connexion/client/authorities/create/

If this is not tractable, then I would recommend creating an extension
to DSpace 1.x that is separately encapsulated, rather than introducing
further complexity into the existing DSpace model we are struggling to
clean up today.  A whole different data model for the
Authors/Organizations that would store the relationships you are
discussing.  Firstly, it would be much better normalized. Secondly, it
would be dedicated to supporting that specific usecase.

I would propose something along the following steps:

1.) Create separate tables to maintain the authors and organizations
wholly independent from the metadatavalue table. I would create a
service in DSpace to maintain the data expressed within that model.

Entity tables
authors (id, firstname, lastname, email, ...)
organizations (id, orgname, description, url, contact, ....)

Relation tables
authors2organizations (is, authorid, organizationid)

2.) Expose this author/organization datamodel through an
implementation of the Authority Control provider to support assigning
author and organizations to DSpace items.

3.) Create Author and Organization objects and map them to the
database tables, create edit interfaces to support the management of
authors and organizations.

This approach would extend DSpace to support a new DSpace object types
that would be more appropriate for describing authors and
organizations in greater detail and associating them with Items

I have spoken with a number of organizations that share this similar
need.  I think that the usecase would be solved by an approach such as
this. Likewise, it would allow a means to create "Author Profiles"
within DSpace that would be sufficient for further associations with
other naming services, much like the associations you see today
between a "Google Profile", "Twitter", "Facebook", "Blog", etc.

I think this is the correct direction forward for approaching the
project, and I think that there are enough organizations that are
wanting the capability, that if it were brought more into the
community as a project with some preemptive design direction lead by
the Duraspace community and those external stakeholders that want to
see greater features in author/organization authority support in
DSpace we would end up with an exemplary casestudy for how DSpace
customization project design should be advised on in future DSpace
versions.

Sincerely,
Mark

--
Mark R. Diggory
Head of U.S. Operations - @mire

http://www.atmire.com - Institutional Repository Solutions
http://www.togather.eu - Before getting together, get t...@ther

On Sun, May 9, 2010 at 8:58 AM, Andrea Bollini <boll...@cilea.it> wrote:
>
> Hi all,
>
> Christophe Dupriez ha scritto:
>
> Hi again Mateusz!
>
> Let's try to wrap up:
>
> For each field, DSpace 1.6 Authority Control stores:
> 1) The text value that is indexed by Lucene
> 2) The ID that is managed in SQL (new column added to metadatavalue table)
>
>
> the authority key (ID) is a String so you can easly wrap multiple authorities 
> behind an unique authority plugin and add the "authority schema" (i.e. 
> person_) as prefix.
> The authority control system is also aware of the collection where the item 
> is included so that you can use this information to query only the right 
> authority in most of cases.
>
> 3) A number (trust grade into the alignment of the ID with the text
> value) (new column also).
>
>
> You can manage variants and translations implementing the
> org.dspace.content.authority.AuthorityVariantsSupport
> interface. We have used this to provide in the Hub <http://hub.hku.hk> the 
> ability to insert the same researcher in more place of the browse index 
> (Chinese name, English name, etc.) see for example
>
> Researcher: http://hub.hku.hk/rp/rp00056
>
> in the browse system as "Bacon-Shone"
> http://hub.hku.hk/browse?type=author&order=ASC&rpp=100&starts_with=Bacon-Shone
>
> in the browse system as "白景崇"
> http://hub.hku.hk/browse?type=author&order=ASC&rpp=100&starts_with =白景崇
>
>
> Please note also that the authority key is indexed as untokenized field in 
> the lucene document
> http://fisheye3.atlassian.com/browse/dspace/dspace/trunk/dspace-api/src/main/java/org/dspace/search/DSIndexer.java?r=HEAD
>
> 1013                                doc.add( new 
> Field(indexConfigArr[i].indexName+"_authority",
> 1014                                   mydc[j].authority,
> 1015                                   Field.Store.NO,
> 1016                                   Field.Index.UN_TOKENIZED));
>
> and all the related "text data" (primary form and variants) are indexed in 
> the normal way.
>
>
> Mateusz, I definitely agree with Christophe you should use the authority 
> control framework to archive your goals without hack the DSpace data model.
> The information that you want store are mostly related to entities that 
> doesn't need to go in DSpace.
> You can build external systems, nicely integrated with DSpace, to store and 
> manage this information, this is the approach that we have used to manage the 
> ResearcherPage @ HKU project.
> The affiliation of an author is, mostly of the time, a metadata of the author 
> not a metadata of the item...
> IMHO you should build an external system where any authors has one or more 
> affiliations, keeping date range information and other details about any 
> position.
> The DSpace Item only need to store a reference to the author record and using 
> the authority key you are able to query the "affiliation system" to retrieve 
> and display all the needed information.
> Please note that the above examples apply exactly this strategy, the Browse 
> system has not hacked in the HUB, we have only improved the "view" so that 
> the ResearcherPage system is queried and extra information are showed to the 
> user.
>
> Your most complex goal to archive is enable browsing and searching for 
> informations included in the "external system". You should try to extends the 
> DSpace browse/search system in a way to be more pluggable during the index 
> phase so to be able to index also informations not directly included in the 
> item.
>
> Best,
> Andrea Bollini
>
> My "quick and dirty" proposal is to piggy back the "indice" linking
> authors to their institution within the trust grade field.
> And to adapt Authority Control Plugins to your needs without touching
> DSpace internals.
>
> Larry and Andreas, please correct me if I am wrong!!!
> - - - - - - - - - - - -
> ASKOSI Authority Control system I presented in Göteborg
> (http://gupea.ub.gu.se/dspace/handle/2077/21341 ) only store the ID (but
> with the possibility to add coded qualifiers as prefixes and suffixes)
> in the DSpace field text value.
> To be exact, the ID is prefixed by its Scheme identifier. You would have
> "person_1234" and not 1234 alone: this to sustain multiple authority
> lists for a same field (a frequent situation).
>
> The ID is translated dynamically at indexation time to index:
> * the ID
> * all the words in all translations and synonyms of the corresponding labels
> * the IDs of all generic terms (thesaurus, institutions hierarchies) to
> allow searches encompassing a concept and all its specifics
>
> The ID is translated dynamically at display time with the best label
> available for the user language.
>
> The metadata text value is translated into an ID at data import time.
>
> The update form and the search forms extensively use Autocomplete (or
> menu when the Authority List contains less than 20 items).
> - - - - - - - - - - - -
> Have a nice day!
>
> Christophe
>
>
> Mateusz Neumann a écrit :
>
>
> On Fri, 2010-05-07 at 08:50 +0200, Christophe Dupriez wrote:
>
>
>
> Hi Mateusz!
>
> You plan the following fields:
>
>       * dc.contributor.author_id - id of an author in external system,
>       * dc.contributor.organization - author's affiliation,
>         dc.contributor.organization_id - id of an affiliation in
>         external system,
>       * dc.identifier.bib_id - id of an Item in external bibliographical
>         system
>
> Beware that using different properties for multiple "sub-objects" in a
> flat model like DSpace brings the problem of synchronization of fields
> occurrences during updates (and display).
>
>
>
> Do you mean I must keep consistency between dc.contributor.organization
> ("University of Berne") and dc.contributor.organization-id (1234)?  That
> is a good point.  Thank you.
>
>
>
>
> People are used to papers with little indices after authors names
> (affiliations being prefixed by those indices), may be it is a solution
> for you?
>
> You could then use the standard DSpace 1.6 Authority control:
> * dc.contributor.author controlled by an Authority source (you then
> already have a field part for the name (Lucene Indexation) and a part
> for the author id.).
>   You also have a numerical qualifier used today to grade the authority
> quality. You could add 1, 2, 3, 4, 5, etc. to this qualifier to indicate
> the indice of the linked affiliation.
>
>
>
> I am not sure if I got you right...  Should dc.contributor.author have
> value "Albert Einstein" or "Albert Einstein (1234)"?  If latter, Lucene
> must parse dc.contributor.author field before indexing and work on
> "Albert Einstein" solely.  Right?
>
>
>
>
> * dc.contributor.affiliation controlled by an Authority source for
> institution. Name will be indexed by Lucene, Id will be stored and you
> can add an indice to the "grade", indice refered by affiliated authors.
>
>
>
> Same question as above: dc.contributor.organization (or
> dc.contributor.affiliation) would be (in your example) "University of
> Zurich" or "(1234) University of Zurich"?
>
>
>
>
> You will then have to:
> 1) create Authority Control Plugins for you Authorities (authors and
> institutions)
> 2) extend the Authority Control Classes to extract the indice from the
> grade and make it a separate field (whilst not changing the database itself)
> 3) modify the item display and update to allow modification of the indice
>
> Larry Stone(1) and Andreas Bellini(2) are the authors of this part of
> the code...
> (1) MIT
> (2) CILEA
>
>
> Have a nice day!
>
> Christophe
>
>
>
> thanks a lot for all your ideas!
>
> --
> Mateusz
>
>
>
>
>
> Mateusz Neumann a écrit :
>
>
>
> On Thu, 2010-05-06 at 12:04 +0200, Christophe Dupriez wrote:
>
>
>
>
> Hi again Mateusz!
>
>
>
>
> Bonsoir Christophe
>
>
>
>
>
> If you want to keep more than current affiliation of the author, I see
> two possible solutions:
>
> 1) You add an intermediate object to keep information about when the
> author worked at a given institution and with which responsibilities.
>     Such a structure is too heavy to be represented nicely by relations
> between flat catalographic records (you need a FRBRized library
> management application).
>     But it can certainly be implemented as an SQL or RDF based addon to
> DSpace.
>
>
>
>
> That is what we are heading towards.  As I have wrote earlier this day,
> we need to have quite sophisticated reporting tool based on
> bibliographical data.  We are still investigating the best approach but
> it seems we will end up with a little modified DSpace and another
> bibliographical/reporting system linked by some nice interface.
>
>
>
>
>
> 2) Using WindMusic like Authority control, you add a reference to the
> institution next to the author:
>     records can then be searched by authors and by institutions (and by
> institution containing institutes).
>    For instance, in my coding system, this could be written in
> dc.contributor.author (or another field of your choice):
>           author_ person_1234 _deceased doctorant_ institute_345
>    Meaning that "John Smith (person #1234) (now deceased) authored the
> document when he was doctorant in University of XYZ (code 345)"
>    In some catalog or database (DSpace or other) accessible on the web,
> person "1234" is defined and the user can find there more information
> about the person.
>    Idem for institutions (University of XYZ).
>
>
>
>
> I think of adding four properties:
>       * dc.contributor.author_id - id of an author in external system,
>       * dc.contributor.organization - author's affiliation,
>         dc.contributor.organization_id - id of an affiliation in
>         external system,
>       * dc.identifier.bib_id - id of an Item in external bibliographical
>         system
>
> Of course there would be possibility to search/browse by Affiliations
> (similarly to "Issue Dates", "Authors", "Titles", "Subjects").  And
> those '*_id' fields will be used in interface with an external
> bibliographical system.
>
>
>
>
>
>    In Lucene (helped with Ajax Autocomplete to select persons or
> institutions), you can then search using the person name (translated
> into its code) and/or institutions.
>    You can even restrict further the search using the prefixes (author,
> doctorant) or the suffixes (deceased).
>
>
>
>
> That is nice.  I will think about it.
>
>
>
>
>
>    Codes are indexed but also their translations and all their synonyms.
>
>    I use this at the Belgium Poison Centre to represent PubMed MeSH
> indexing (which uses hierarchical "qualifiers" next to each MeSH term).
>
> But, I agree with you that an information structure is not a complete
> application!
> For a complete existing "Author page" management system, something like
> what HKU has implemented may be even nearer to your needs.
>
> Good luck!
>
>
>
>
> Merci beaucoup!
>
> --
> Mateusz
>
>
>
>
>
>
> Christophe
>
>
> Mateusz Neumann a écrit :
>
>
>
>
> Bonjour Christophe
>
> On Thu, 2010-05-06 at 08:52 +0200, Christophe Dupriez wrote:
>
>
>
>
>
> Dzieńdobry Mateusz!
>
> In the SKOSified world, hierarchies are all around:
> http://www.w3.org/2004/02/skos/
> My proposal for authority control in DSpace is based on the SKOS
> standard design.
>
> DSpace is managing "human targeted" catalogues of simple and "flat" objects.
> But, if fields of those objects can contain links (relations with other
> objects) open to human exploration but also to automated management,
> then the DSpace is not flat anymore.
>
> Bibliographic Records can relate to Authors which can relate to
> Institute which can relate to Institutions, etc.
>
>
>
>
>
> We have a proverb in Poland "the devil is hidden in details" which means
> the real problems begin when you dig deeper.  There are several issues
> that might be difficult to address using your approach:
>       * an author might have a few affiliations (for example "University
>         of Berne" that she/he uses in articles on mathematics and
>         "University of Zurich" for ones concerning physics)
>       * an author might change (several times) its affiliation (let us
>         say for the "Institute for Advanced Study")
> still he is the same "Albert Einstein".
>
> If I understand it right, using DSpace catalogue hierarchy approach, we
> would end up having three different Albert Einsteins.  Which is
> something we must avoid.
>
>
>
>
>
>
> So, I use DSpace to manage objects and I use my SKOS API to manage
> values of objects fields.
> Those values can be an id of an object (a relation).
> Those id being coded as words (SKOSscheme_codeInScheme), Lucene word
> search insures efficient retrieval along links.
> And if the SKOS authority list is coming from a DSpace collection, the
> loop is closed.
>
>
>
>
>
> That is quite similar to our plan.  We want to make this this "recurent
> table" searchable.  During indexing a new Item we will look for
> additional metadata (configured somewhere) and tell Lucene to take
> special care of them.  Then in Manakin we plan to display nice
> ajax-expandable tree of metadata (for affiliations: University ->
> Department -> ...) that would link to specific queries (for example
> "((affiliation:Institute for Advanced Study))".
>
>
>
>
>
>
> In WindMusic, you find:
> * The DSpace collections being documents but also Authority list
> (authors, publishers, keywords in hierarchies, collections, orchestras):
> http://www.windmusic.org/dspace/community-list
> * A Mazurka: http://www.windmusic.org/dspace/handle/68502/35027
> * Mazurka as a subject:
> http://www.windmusic.org/dspace/handle/68502/22050?searchname=lorthes_183
> * Mazurka is a musical genre:
> http://www.windmusic.org/dspace/handle/68502/22050?searchname=lorthes_183
>   The record can be retrieved by the subject "Mazurka" but also by its
> all its generics "Musical Genre", "Music"
> * The SKOS view of Authors records make their nationality a
> "broadMatch": you can therefore find all records indexed by an author of
> a given nationality
>    A search for musicals from a polish author:
> http://www.windmusic.org/dspace/simple-search?query=country%3Acountry_PL
>    The same principle can be used to search for all the documents
> written by somebody belonging to a given institute/institution (whatever
> the depth of the hierarchy)
>
>
>
>
>
> Well it seems it is quite the same approach we are developing :)  That
> is assuring to see you have deployed something that similar.
>
>
>
>
>
>
> If your aims are strictly to manage authors and institute, you may also
> ask for Andreas Bellini to explain the work he done with David Palmer at
> Hong Kong University (see below)
>
>
>
>
>
> We want to build a solution that would work as basic Institutional
> Repository AND would enable to create some nice reports on how much has
> someone written in 2009 or how many publications came from Department of
> Physics at University of Warsaw in 2010 etc.  So it might be somehow
> similar to what Andreas Bellini has achieved.  Thanks for forwarding his
> email.
>
> Salut!
>
> --
> Mateusz
>
> Cześć !
>
> Christophe
>
> Message to the DSpace General list in december 18th 2009:
>
> The University of Hong Kong wishes to announce HKU ResearcherPages for
> each of its many authors, now appearing in the The HKU Scholars Hub, the
> institutional repository of HKU.  Three examples,
>
> http://hub.hku.hk/rp/rp00023        Prof Samaranayake, Dean of Dentistry
>
> http://hub.hku.hk/rp/rp00056        Prof Bacon-Shone, Associate Dean of
> Social Sciences
>
> http://hub.hku.hk/rp/rp00060        Prof Tam, Pro-Vice Chancellor (Research)
>
>
>
> This work is the result of a successful collaboration between HKU and
> CILEA (AePIC Team).  Much of the code developed for this project has
> been included in the forthcoming version 1.6 of DSpace, which will soon
> be released to the community.
>
> http://www.cilea.it/
>
>
>
> Highlights:
>
>
>
> * Author-centric bibliometrics from Scopus – the results of an on-going
> massive bibliometric rectification project, between Elsevier and HKU.
> [Pls note the +/- expand/collapse box for bibliometrics in pages
> above].  This is in preparation for our annual Performance Reviews, and
> our impending Research Assessment Exercise.
>
>
>
> * Author-centric bibliometrics from ResearcherID.com (Web of Science) –
> the results of an on-going large scale institutional upload of
> publication lists for each HKU author to RID.  One example,
>
> http://www.researcherid.com/rid/C-4405-2009
>
>
>
> * Unique identifier for each HKU researcher.  In URLs above, “rp00023”,
> “rp00056”, and “rp00060” are examples of this.
>
>
>
> * Integration with HKU’s Media Directory, to show subjects on which each
> researcher can speak to, or write for the media, and in which
> languages.  The Hub is now an expert finder, for those in gov’t &
> industry wishing to find specialists for consultancies, contract
> research, etc.  Pls note facets by which RPs can be retrieved,
>
> http://hub.hku.hk/rp/search.htm
>
>
>
> * Authority control; disambiguation of like named individuals, linkage
> from variant names to the established heading, synonymy between
> established headings in different vernacular scripts.  Examples,
>
> http://hub.hku.hk/browse?type=author&order=ASC&rpp=100&starts_with=tam+p
>
> http://hub.hku.hk/browse?type=author&order=ASC&rpp=100&starts_with=%E8%AD%9A%E5%AE%B6%E9%9B%AF
>
>
>
> * Article level metrics from Scopus, Web of Science, and Google
> Scholar.  In example below, pls scroll down to red buttons,
>
> http://hub.hku.hk/handle/123456789/43518
>
>
>
> Further description:
>
>
>
> * Presentation given at the Dec 2-4 Digital Repository Federation
> International Conference (DRFIC 2009), Tokyo:
>
> http://hub.hku.hk/handle/123456789/56562
>
> * Presentation given at the Nov 18-20 Pacific Rim Digital Library
> Association (PRDLA 2009), Auckland:
>
> http://prdla.ucmercedlibrary.info/?s=critical
>
> * Thomson Reuter’s Customer Profile and Case Study
>
> http://wokinfo.com/benefits/testimonials/palmer/
>
> * Thomson Reuters’ “Intelligent Information for Life” article
>
> http://intelligentinformationforlife.com/palmer/
>
> * HKU press release
>
>  http://www.hku.hk/press/news_detail_6081.html
>
>
>
> The next round of development begins soon.
>
>
>
> David Palmer
>
> Systems Librarian
>
> Technical Services Support Team Leader
>
> Scholarly Communications Unit Head
>
> The University of Hong Kong Libraries
>
> Pokfulam Road
>
> Hong Kong
>
> tel. +852 2859 7004
>
>
>
>
>
> Mateusz Neumann a écrit :
>
>
>
>
>
> Bonjour Christophe
>
> On Tue, 2010-05-04 at 23:52 +0200, Christophe Dupriez wrote:
>
>
>
>
>
>
> Dobry Wieczór Mateusz!
>
> You may want to look at the WindMusic presentation in Göteborg.
> http://gupea.ub.gu.se/dspace/handle/2077/21341
> http://www.windmusic.org
>
> In WindMusic, Authors are stored in a DSpace collection (so they are
> managed with the regular DSpace UI)
> And they are used for search and update as an authority control list.
>
> Dynamic SQL source allows to access dynamically different source (with
> strong caching) to use any accessible database as an authority source.
>
> Multiple authorities for a field are supported (and the option of "free"
> uncontroled content): it is often necessary to "chain" authorities so an
> Author (a Subject, a Journal...) can be in a local application, in the
> institution repository or in an external repository;
> Each authority source with its independant access method...
>
>
>
>
>
>
> Thanks a lot for sharing the ideas.  It is a big pleasure to see
> something working :)  But I think your solution would not be enough for
> our sophisticated demands.  I think I would rather stay on the path we
> have already been thinking of, maybe "widening" it a little bit as Mark
> Diggory has suggested.
>
> There would be a new "recurrent table" (where records can point to
> another "parent" records in this table, enabling creation of tree-like
> structure).  Records of this table would define affiliations structure
> (University -> Department -> Institute -> ...).  An Item (or an Entity
> in general, as Mark has suggested) would point to that table defining
> for example author's affiliation.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Dspace-devel mailing list
> Dspace-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-devel
>
>
> --
> Dott. Andrea Bollini
> Project Manager, IT Architect & Systems Integrator
> Sezione Servizi per le Biblioteche e l'Editoria Elettronica
> CILEA, http://www.cilea.it
> tel. +39 06-59292853
> cel. +39 348-8277525
>
> ---
>
> Disclaimer: the content of this email is confidential and may be privileged, 
> and it must not be disclosed or copied without the sender's consent. If you 
> have received this message in error, please notify the sender and remove it 
> from your system. The content of this email does not constitute legal advice, 
> nor any responsibility is accepted for loss or damage incurred as a result of 
> acting upon its contents or attachments.
> The statements and opinions expressed in this email are those of the author 
> and do not necessarily reflect those of the employer.
>
> ------------------------------------------------------------------------------
>
>
> _______________________________________________
> Dspace-devel mailing list
> Dspace-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dspace-devel
>

------------------------------------------------------------------------------

_______________________________________________
Dspace-devel mailing list
Dspace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to