Thanks Claudia, Mark and Andrea for your comments. It makes intuitive sense to me to avoid HTML “pollution” within the metadata fields. But it still raises the issue of what to do about special characters in metadata fields.
The field + formatted field idea seems ok, but I fear it will be a bit of a data management nightmare. Thanks again for your thoughts on this. Gary Gary Browne | Technical Manager, Developments Online Services University of Sydney Library THE UNIVERSITY OF SYDNEY Level 1, Fisher Library F03, The University of Sydney NSW 2006 T +61 2 9351 5946 | M +61 405 647 868 E gary.bro...@sydney.edu.au<https://webmail.sydney.edu.au/owa/redir.aspx?C=OXYu29eFmlOiJviVN3CHunM5oGoASVvNNYb-H0ZnmZGiO6bY9qPUCA..&URL=mailto%3agary.browne%40sydney.edu.au> From: Andrea Schweer <schw...@waikato.ac.nz> Date: Tuesday, 15 August 2017 at 7:26 am To: "Mark H. Wood" <mwoodiu...@gmail.com>, DSpace Technical Support <dspace-tech@googlegroups.com>, Gary Browne <gary.bro...@sydney.edu.au> Subject: Re: [dspace-tech] Special characters in metadata Hi Gary, all, On 08/15/2017 02:03 AM, Mark H. Wood wrote: On Sunday, August 13, 2017 at 9:26:56 PM UTC-4, Gary Browne wrote: This leads me to a more general question of how people handle special characters in the metadata, generally speaking? Is this usually accomplished using Unicode, or are there hacks to allow HTML (I presume including HTML in metadata values is generally frowned upon)? They must be using Unicode. Only a few fields are equipped to render HTML *as* HTML. I haven't checked, but I think we'd find that all of these are fields such as abstract which are displayed as block elements, not inline fields like title and author. It's pretty easy to make DSpace (XMLUI) render HTML as HTML. Look at how the introductory text for collection pages is rendered; it's really just a matter of using copy-of not value-of in the XSL crosswalk. https://github.com/DSpace/DSpace/blob/dspace-6_x/dspace-xmlui-mirage2/src/main/webapp/xsl/aspect/artifactbrowser/collection-view.xsl#L58<https://protect-au.mimecast.com/s/GN1YBofl4L5S3?domain=github.com> However, I'd be very careful with this; you wouldn't want to allow just about anything and risk showing malicious content on your item pages. Plus of course, Mark's comment on harvesters: And those HTML-enabled fields raise another question: what are harvesters to make of metadata which are sprinkled with HTML? Even if the harvesting site is using the data for display, it may not be taking any trouble to render embedded HTML. If the harvesting site wants plain text (e.g. for searching), what will it do with the HTML pollution? The best way I can think of (this has already been suggested to the U Sydney folks on a different mailing list by someone else) is to have two parallel fields: one for the "formatted" version, one for plain text. Then you can expose the plain text one to harvesters / search indexing and use the formatted one for item pages in your repository. The challenge will be keeping the two in synch -- I guess you could instruct repository admin staff to only edit the formatted version, and write a curation task that strips the formatting and puts the remainder into the plain text version. Or of course keep the two values in synch manually. cheers, Andrea -- Dr Andrea Schweer Lead Software Developer, ITS Information Systems The University of Waikato, Hamilton, New Zealand +64-7-837 9120 -- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscr...@googlegroups.com. To post to this group, send email to dspace-tech@googlegroups.com. Visit this group at https://groups.google.com/group/dspace-tech. For more options, visit https://groups.google.com/d/optout.