Thanks Claudia, Mark and Andrea for your comments.

It makes intuitive sense to me to avoid HTML “pollution” within the metadata 
fields. But it still raises the issue of what to do about special characters in 
metadata fields.

The field + formatted field idea seems ok, but I fear it will be a bit of a 
data management nightmare.

Thanks again for your thoughts on this.

Gary


Gary Browne | Technical Manager, Developments
Online Services
University of Sydney Library
THE UNIVERSITY OF SYDNEY
Level 1, Fisher Library F03, The University of Sydney NSW 2006
T +61 2 9351 5946 | M +61 405 647 868
E 
gary.bro...@sydney.edu.au<https://webmail.sydney.edu.au/owa/redir.aspx?C=OXYu29eFmlOiJviVN3CHunM5oGoASVvNNYb-H0ZnmZGiO6bY9qPUCA..&URL=mailto%3agary.browne%40sydney.edu.au>


From: Andrea Schweer <schw...@waikato.ac.nz>
Date: Tuesday, 15 August 2017 at 7:26 am
To: "Mark H. Wood" <mwoodiu...@gmail.com>, DSpace Technical Support 
<dspace-tech@googlegroups.com>, Gary Browne <gary.bro...@sydney.edu.au>
Subject: Re: [dspace-tech] Special characters in metadata

Hi Gary, all,
On 08/15/2017 02:03 AM, Mark H. Wood wrote:
On Sunday, August 13, 2017 at 9:26:56 PM UTC-4, Gary Browne wrote:
This leads me to a more general question of how people handle special 
characters in the metadata, generally speaking?

Is this usually accomplished using Unicode, or are there hacks to allow HTML (I 
presume including HTML in metadata values is generally frowned upon)?


They must be using Unicode.  Only a few fields are equipped to render HTML *as* 
HTML.  I haven't checked, but I think we'd find that all of these are fields 
such as abstract which are displayed as block elements, not inline fields like 
title and author.

It's pretty easy to make DSpace (XMLUI) render HTML as HTML. Look at how the 
introductory text for collection pages is rendered; it's really just a matter 
of using copy-of not value-of in the XSL crosswalk.
https://github.com/DSpace/DSpace/blob/dspace-6_x/dspace-xmlui-mirage2/src/main/webapp/xsl/aspect/artifactbrowser/collection-view.xsl#L58<https://protect-au.mimecast.com/s/GN1YBofl4L5S3?domain=github.com>
However, I'd be very careful with this; you wouldn't want to allow just about 
anything and risk showing malicious content on your item pages. Plus of course, 
Mark's comment on harvesters:


And those HTML-enabled fields raise another question:  what are harvesters to 
make of metadata which are sprinkled with HTML?  Even if the harvesting site is 
using the data for display, it may not be taking any trouble to render embedded 
HTML.  If the harvesting site wants plain text (e.g. for searching), what will 
it do with the HTML pollution?

The best way I can think of (this has already been suggested to the U Sydney 
folks on a different mailing list by someone else) is to have two parallel 
fields: one for the "formatted" version, one for plain text. Then you can 
expose the plain text one to harvesters / search indexing and use the formatted 
one for item pages in your repository. The challenge will be keeping the two in 
synch -- I guess you could instruct repository admin staff to only edit the 
formatted version, and write a curation task that strips the formatting and 
puts the remainder into the plain text version. Or of course keep the two 
values in synch manually.

cheers,
Andrea



--

Dr Andrea Schweer

Lead Software Developer, ITS Information Systems

The University of Waikato, Hamilton, New Zealand

+64-7-837 9120

-- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

Reply via email to