On Mon, May 18, 2020 at 01:11:17PM +0000, Poulter, Dale wrote: > We are migrating several items from an older system to DSpace using the > simple item import. As is often the case with older systems, the data is > not as clean as we would like. As a result several items fail due to bad > html (open tags no closing tags, and a few diacritic issues). One way to > allow the data to migration is to wrap the text in <![CDATA[[ ....]]> . > However, it appears the import ignores anything in the CDATA section. Is > this expected behavior?
I assume that it was a typo, but a CDATA section opens with "<![CDATA[" not "<![CDATA[[". Are you talking about the content files or the metadata? IOW would you describe the problem more thoroughly. A tool like HTML Tidy might help if you are ingesting HTML files. For metadata, you should know that only some fields will be interpreted as HTML, and in those only a subset of HTML is processed. I have a small and slowly growing set of substitution rules wired into my batch ingestion process, to take care of things like naked left brokets and "R&D". -- Mark H. Wood Lead Technology Analyst University Library Indiana University - Purdue University Indianapolis 755 W. Michigan Street Indianapolis, IN 46202 317-274-0749 www.ulib.iupui.edu -- All messages to this mailing list should adhere to the DuraSpace Code of Conduct: https://duraspace.org/about/policies/code-of-conduct/ --- You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/dspace-tech/20200518151820.GC16830%40IUPUI.Edu.
signature.asc
Description: PGP signature
