Hello, following up on the discussion
https://sourceforge.net/mailarchive/message.php?msg_id=31215737 http://web.archiveorange.com/archive/v/hxciqsTWLSVu2rG3JE47 started by James Leonard Halliday with the subject "text encoding problem with bitstreams in DSpace 3.1 - resolved": > Hi everyone, > > I posted about this a while back, and finally found a workaround > so I wanted to share. My problem was regarding HTML bitstreams > in DSpace 3.1 (XMLUI). > > In previous versions of DSpace, the encoding for my UTF-8 bitstreams > worked just fine, but in DSpace 3.1, the encoding for ONLY the bitstreams > was coming out as ISO-8859 instead. After much searching, I finally found > a workaround. First of all thanks for sharing, Leonard! Now Dspace 4.0 rc3 has the same problem and the same fix helps: in /dspace/webapps/xmlui/WEB-INF/web.xml replace: <filter> <filter-name>SetCharacterEncoding</filter-name> <filter-class>org.dspace.app.xmlui.cocoon.SetCharacterEncodingFilter</filter-class> <init-param> <param-name>encoding</param-name> <param-value>UTF-8</param-value> </init-param> </filter> with <filter> <filter-name>SetCharacterEncoding</filter-name> <filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class> <init-param> <param-name>encoding</param-name> <param-value>UTF-8</param-value> </init-param> <init-param> <param-name>forceEncoding</param-name> <param-value>true</param-value> </init-param> </filter> Answering Mark's questions: > I'm thinking that our filter as written could never have done what you > expect, and the effect was produced elsewhere. Our filter only sets > the request's encoding. Spring's filter is documented to also set the > response's encoding when forceEncoding=true. Perhaps BitstreamReader > should just set the encoding on the response? It seems that Spring's filter not only forces encoding for text/html, but also converts the file. Please check out the results with default Dspace web.xml (web.xml.old.data and web.xml.old.head) and modified as described above (web.xml.new.data and web.xml.new.head): root@dspace4-test:~# ls -l total 52 -rw-r--r-- 1 root root 15140 Jan 13 15:06 web.xml.new.data -rw-r--r-- 1 root root 384 Jan 13 15:06 web.xml.new.head -rw-r--r-- 1 root root 24656 Jan 13 15:05 web.xml.old.data -rw-r--r-- 1 root root 389 Jan 13 15:06 web.xml.old.head web.xml.*.data files were obtained by running "lynx --dump" and web.xml.*.head files were obtained by "lynx --head --dump" As you can see headers differ as one would expect: root@dspace4-test:~# diff -u web.xml.old.head web.xml.new.head --- web.xml.old.head 2014-01-13 15:06:00.036506000 -0500 +++ web.xml.new.head 2014-01-13 15:06:52.800506000 -0500 @@ -1,14 +1,14 @@ HTTP/1.1 200 OK Server: Apache-Coyote/1.1 -Set-Cookie: JSESSIONID=818F618169946A0770D8DE6A572348E5; Path=/xmlui/; HttpOnly +Set-Cookie: JSESSIONID=9A3ADC942740B4A31CF0AC971CD4BCBB; Path=/xmlui/; HttpOnly X-Cocoon-Version: 2.2.0 Vary: User-Agent Last-Modified: Mon, 13 Jan 2014 19:50:43 GMT -Expires: Mon, 13 Jan 2014 21:06:00 GMT -Content-Type: text/html;charset=ISO-8859-1 +Expires: Mon, 13 Jan 2014 21:06:52 GMT +Content-Type: text/html;charset=UTF-8 Content-Language: en Content-Length: 18139 -Date: Mon, 13 Jan 2014 20:06:00 GMT +Date: Mon, 13 Jan 2014 20:06:52 GMT Connection: close But data files also differ (check out the size) and this: root@dspace4-test:~# cat web.xml.new.data | head -n 3 | hd 00000000 d0 92 d0 b2 d0 b5 d0 b4 d0 b5 d0 bd d0 b8 d0 b5 |................| 00000010 0a 0a d0 9f d1 80 d0 be d0 b2 d0 b5 d1 80 d0 b5 |................| 00000020 d0 bd d0 be 3a 20 32 30 20 d0 bc d0 b0 d1 80 d1 |....: 20 .......| 00000030 82 d0 b0 20 31 39 34 38 20 d0 b3 d0 be d0 b4 d0 |... 1948 .......| 00000040 b0 0a |..| 00000042 root@dspace4-test:~# cat web.xml.old.data | head -n 3 | hd 00000000 c3 90 c3 90 c2 b2 c3 90 c2 b5 c3 90 c2 b4 c3 90 |................| 00000010 c2 b5 c3 90 c2 bd c3 90 c2 b8 c3 90 c2 b5 0a 0a |................| 00000020 c3 90 c3 91 c3 90 c2 be c3 90 c2 b2 c3 90 c2 b5 |................| 00000030 c3 91 c3 90 c2 b5 c3 90 c2 bd c3 90 c2 be 3a 20 |..............: | 00000040 32 30 20 c3 90 c2 bc c3 90 c2 b0 c3 91 c3 91 c3 |20 .............| 00000050 90 c2 b0 20 31 39 34 38 20 c3 90 c2 b3 c3 90 c2 |... 1948 .......| 00000060 be c3 90 c2 b4 c3 90 c2 b0 0a |..........| 0000006a Petya. ------------------------------------------------------------------------------ CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments & Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk _______________________________________________ Dspace-devel mailing list Dspace-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-devel