Re: [dspace-tech] Special characters in metadata

Andrea Schweer Mon, 14 Aug 2017 15:09:01 -0700

Hi Gary,

I think the answer unfortunately is, when repositories were "invented" the stance on special characters in metadata was not to use them. That's what the ecosystem around them has used as an underlying assumption -- see harvesters etc. So at this stage, workarounds are the best you're going to get unfortunately.

With the curation task approach, I don't think you're necessarily looking at a nightmare of keeping the fields in synch, just one-off custom development to create the task and hook it up to metadata change events.

cheers,
Andrea

On 08/15/2017 09:52 AM, Gary Browne wrote:

Thanks Claudia, Mark and Andrea for your comments.

It makes intuitive sense to me to avoid HTML “pollution” within the metadata fields. But it still raises the issue of what to do about special characters in metadata fields.

The field + formatted field idea seems ok, but I fear it will be a bit of a data management nightmare.

Thanks again for your thoughts on this.

Gary

Gary Browne | Technical Manager, Developments
Online Services
University of Sydney Library
THE UNIVERSITY OF SYDNEY
Level 1, Fisher Library F03, The University of Sydney NSW 2006
T +61 2 9351 5946 | M +61 405 647 868
E gary.bro...@sydney.edu.au

From: Andrea Schweer <schw...@waikato.ac.nz>
Date: Tuesday, 15 August 2017 at 7:26 am
To: "Mark H. Wood" <mwoodiu...@gmail.com>, DSpace Technical Support <dspace-tech@googlegroups.com>, Gary Browne <gary.bro...@sydney.edu.au>
Subject: Re: [dspace-tech] Special characters in metadata

Hi Gary, all,

On 08/15/2017 02:03 AM, Mark H. Wood wrote:

On Sunday, August 13, 2017 at 9:26:56 PM UTC-4, Gary Browne wrote:

This leads me to a more general question of how people handle special characters in the metadata, generally speaking?

Is this usually accomplished using Unicode, or are there hacks to allow HTML (I presume including HTML in metadata values is generally frowned upon)?

They must be using Unicode. Only a few fields are equipped to render HTML *as* HTML. I haven't checked, but I think we'd find that all of these are fields such as abstract which are displayed as block elements, not inline fields like title and author.

It's pretty easy to make DSpace (XMLUI) render HTML as HTML. Look at how the introductory text for collection pages is rendered; it's really just a matter of using copy-of not value-of in the XSL crosswalk.
https://github.com/DSpace/DSpace/blob/dspace-6_x/dspace-xmlui-mirage2/src/main/webapp/xsl/aspect/artifactbrowser/collection-view.xsl#L58
However, I'd be very careful with this; you wouldn't want to allow just about anything and risk showing malicious content on your item pages. Plus of course, Mark's comment on harvesters:

And those HTML-enabled fields raise another question: what are harvesters to make of metadata which are sprinkled with HTML? Even if the harvesting site is using the data for display, it may not be taking any trouble to render embedded HTML. If the harvesting site wants plain text (e.g. for searching), what will it do with the HTML pollution?

The best way I can think of (this has already been suggested to the U Sydney folks on a different mailing list by someone else) is to have two parallel fields: one for the "formatted" version, one for plain text. Then you can expose the plain text one to harvesters / search indexing and use the formatted one for item pages in your repository. The challenge will be keeping the two in synch -- I guess you could instruct repository admin staff to only edit the formatted version, and write a curation task that strips the formatting and puts the remainder into the plain text version. Or of course keep the two values in synch manually.

cheers,
Andrea
-- 
Dr Andrea Schweer
Lead Software Developer, ITS Information Systems
The University of Waikato, Hamilton, New Zealand
+64-7-837 9120
--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscr...@googlegroups.com.
To post to this group, send email to dspace-tech@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.

-- 
Dr Andrea Schweer
Lead Software Developer, ITS Information Systems
The University of Waikato, Hamilton, New Zealand
+64-7-837 9120





-- 

You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech+unsubscr...@googlegroups.com.

To post to this group, send email to dspace-tech@googlegroups.com.

Visit this group at https://groups.google.com/group/dspace-tech.

For more options, visit https://groups.google.com/d/optout.

Re: [dspace-tech] Special characters in metadata

Reply via email to