Hello

I would find it useful if the URLs of datastreams could be unescaped. E.g. 
if a METS document has a datastream like:

<fileGrp ID="...">
   <file OWNERID="M" ...>
     <FLocat xlink:href="http://example.com/ds.php?arg1=val1&amp;arg2=val2"; 
.../>

the URL would be transformed into:

http://example.com/df.php?arg1=val1&arg2=val2

Currently, the escaped &amp; is left as it is, so fetching the datastream 
can fail.

I don't know if this is a deliberate thing, to stop people from putting in 
funny URLs, or something that might modified in future. Might people be 
depending on the current behaviour of URLs not being unescaped?

A solution could be to modify METSFedoraExtDODeserializer.java:

// need this class
import org.apache.commons.lang.StringEscapeUtils;

// and then round about line 534 for METS's FLocat element
String dsLocation = StringEscapeUtils.unescapeXml(grab(a, m_xlink.uri, "href"));

I'm not sure how to integrate another jar (Apache's commons-lang.jar) into 
the Fedora build, so I can't provide a working patch. But putting the URL 
string through unescapeXml should do it.

Swithun.

-- 
The University of St Andrews is a charity registered in Scotland: SC013532

------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to