[ 
https://issues.apache.org/jira/browse/VELOCITY-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16016857#comment-16016857
 ] 

James R Doyle commented on VELOCITY-880:
----------------------------------------

Further attempts to get getBinaryStream() to work as the solution yields even 
more intrigue. Most interesting is that the HSQLDB
and Oracle drivers have quite different semantics.  If we were to want to 
support BLOB columns, then I think we would be OK.
However, the getCharacterStream() approach is working with both VARCHAR and 
CLOB as well as between Oracle and HSQLDB.
See further below for sample output.

Your original ask: Extracting the raw bytes from getBinaryStream() using Oracle 
shows that the UTF-8 code for Euro symbol IS present.  Have a look and observe 
that 0x20ac is there. However, the conversion problem is going to be another 
bug inside ResourceLoader::buildReader

546865204575726f2043757272656e63792053796d626f6c20ac20697320612074776f2d62797465205554462d38206368617261637465722e00000000000000

<pre>
HSQLDB, VARCHAR, getBinaryStream  : Fault due to JDBC driver not supporting 
getBinaryStream()
================================
incompatible data type in conversion
java.sql.SQLSyntaxErrorException: incompatible data type in conversion
                                  at 
org.hsqldb.jdbc.JDBCUtil.sqlException(Unknown Source)
                                  .....
                                  at 
org.apache.velocity.runtime.resource.loader.DataSourceResourceLoader.getResourceReader

Oracle12, VARCHAR, getBinaryStream:  Test failure due to charset coercion 
problem. 
==============================
     org.junit.ComparisonFailure: Unicode test failed.
     Expected :The Euro Currency Symbol € is a two-byte UTF-8 character.
     Actual   :The Euro Currency Symbol � is a two-byte UTF-8 
character.org.junit.ComparisonFailure: Unicode test failed.
     Expected :The Euro Currency Symbol € is a two-byte UTF-8 character.
     Actual   :The Euro Currency Symbol � is a two-byte UTF-8 character.
</pre>

How would you like to proceed. I believe we should take the changes to CLOB, 
because that is primary use case, VARCHAR should also work as a requirement. 
The JavaDoc changes for CLOB should also be done because that is how we expect 
people to learn and try to use with this resource loader, and what databases 
really support this 'TEXT' column type anyways?   Should we move to ApacheDB 
while at it a the reference database for embedded unit tests?    This resource 
loader does not work, and I'm sure people are either abandoning the approach 
altogether (which is sad), or building workaround like I did if they are able 
to. 

> DataSourceResourceLoader corrupts UTF-8 encoded characters in template
> ----------------------------------------------------------------------
>
>                 Key: VELOCITY-880
>                 URL: https://issues.apache.org/jira/browse/VELOCITY-880
>             Project: Velocity
>          Issue Type: Bug
>    Affects Versions: 2.1.x
>         Environment: Oracle12c and HSQLDB 2.3.4, JDK 1.8
>            Reporter: James R Doyle
>         Attachments: velocity-880.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> A long-withstanding bug in the DataSourceResourceLoader corrupts UTF-8 
> templates retrieved from the database.  The Unit Test suite for this resource 
> loader has deficiencies that hide the bug. 
> The cause of the problem is this:
> {code}
>       InputStream rawStream = rs.getAsciiStream(templateColumn);
> The resolution of the problem is simply:
>       Reader r = rs.getCharacterStream(templateColumn);
>       InputStream rawStream = null;
>            try {
>                 rawStream = IOUtils.toInputStream(IOUtils.toString(r), 
> encoding);
>                 } catch (IOException ioe) {}
> {code}
> Once done, the test failure vanishes:
>         org.junit.ComparisonFailure: Unicode test failed.  
>         Expected :The Euro Currency Symbol € is a two-byte UTF-8 encoded 
> character.
>         Actual   :The Euro Currency Symbol ? is a two-byte UTF-8 encoded 
> character.
> The bug was verified and the fix was tested against Oracle12c and HSQLDB 
> 2.3.4 using a CLOB column to store the template data.
> The Unit Tests for this resource loader need attention.
> Please see VELOCITY-599 ; long standing problem, which has been erroneously 
> marked as resolved but has been in the codebase for a long time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@velocity.apache.org
For additional commands, e-mail: dev-h...@velocity.apache.org

Reply via email to