Re: [dspace-tech] CDATA use for imports

Mark H. Wood Mon, 18 May 2020 08:18:59 -0700

On Mon, May 18, 2020 at 01:11:17PM +0000, Poulter, Dale wrote:
> We are migrating several items from an older system to DSpace using the 
> simple item import.  As is often the case with older systems,  the data is 
> not as clean as we would like.  As a result several items fail due to bad 
> html (open tags no closing tags, and a few diacritic issues).  One way to 
> allow the data to migration is to wrap the text in <![CDATA[[ ....]]> .  
> However, it appears the import ignores anything in the CDATA section.  Is 
> this expected behavior?


I assume that it was a typo, but a CDATA section opens with
"<![CDATA[" not "<![CDATA[[".

Are you talking about the content files or the metadata?  IOW would
you describe the problem more thoroughly.

A tool like HTML Tidy might help if you are ingesting HTML files.

For metadata, you should know that only some fields will be
interpreted as HTML, and in those only a subset of HTML is processed.
I have a small and slowly growing set of substitution rules wired into
my batch ingestion process, to take care of things like naked left
brokets and "R&D".

-- 
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu

-- 
All messages to this mailing list should adhere to the DuraSpace Code of 
Conduct: https://duraspace.org/about/policies/code-of-conduct/
--- 
You received this message because you are subscribed to the Google Groups 
"DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dspace-tech/20200518151820.GC16830%40IUPUI.Edu.

signature.asc
Description: PGP signature

Re: [dspace-tech] CDATA use for imports

Reply via email to