[Mav-user] xslt+fop+utf-8

Yurii Urazlin Fri, 19 Sep 2003 13:44:12 -0700

Hi there,

I have problems with encoding when using xslt and fop transforms together. The problem 
is that in resulting pdf some characters are missing (see attachment test.pdf) they 
change to box and '?'.


Maybe I did something wrong. Has anyone experienced similar problems? If so, could you 
suggest any solution?


Here is my sample command:
-----
<command name="test">
        <view path="test.jsp">
                <transform path="test.xsl" type="xslt"/>
                <transform type="fop" output="pdf" config="userconfig.xml"/>
        </view>
</command>
-----

test.jsp
-----
<%@ page language="java" contentType="text/html;charset=UTF-8"%>
<root>russian text here, like this: маверИк - это круто.</root>
-----

test.xsl
-----
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
    <xsl:output method="xml" indent="yes" encoding="UTF-8"/>
    <xsl:template match="/">
        <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format";>
            <fo:layout-master-set>
                ...
                ...     <xsl:value-of select="."/>
                ...
-----
You see, everything is supposed to be in UTF-8.




I tried to find reasons in code and here is what I found. I looked through code for 
FopTransform.java and found that it creates  bytes from string without specifying 
encoding (using default, not utf-8), though i think, it might pass "utf-8" as param 
here:
---
204: byte[] bytes = input.getBytes();
---
and it looks like also in FopTransform.java getOutputAsString() returns string created 
with default (not utf-8) encoding. Note that it is created from data of xslt transform 
with utf-8 output encoding, 
---
157: this.go(this.fakeResponse.getOutputAsString());
---


so, it looks like this sequence of conversions
--------
~ 157: 
new String(bytes written by xslt transform with output encoding UTF-8 to 
FakeHttpServletResponse with default, not utf-8 encoding) 
>>
~ 204: 
[that string].getBytes() - without encoding, supposed that bytes will be in win-1251
--------
looses some characters.


Thanx in advance,
Yura.

test.pdf
Description: test.pdf

[Mav-user] xslt+fop+utf-8

Reply via email to