[
https://issues.apache.org/jira/browse/CAMEL-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14291186#comment-14291186
]
Volodymyr Sobotovych commented on CAMEL-8191:
---------------------------------------------
I also noticed some incorrectness in description of "charset" option in
documentation (http://camel.apache.org/file2.html):
Camel 2.9.3: this option is used to specify the encoding of the file, _and
camel will set the Exchange property with Exchange.CHARSET_NAME with the value
of this option_. You can use this on the consumer, to specify the encodings of
the files, which allow Camel to know the charset it should load the file
content in case the file content is being accessed. Likewise when writing a
file, you can use this option to specify which charset to write the file as
well. See further below for a examples and more important details.
The incorrectness is highlighted in _italic_ above. No endpoint (file, ftp,
sftp) sets Exchange.CHARSET_NAME as illustrated by the output of this test:
{code}
public class FileEncodingTest extends CamelTestSupport {
@Test
public void testFileEncoding() {
template.sendBody("direct:in", "Hi there");
}
@Override
protected RouteBuilder createRouteBuilder() throws Exception {
return new RouteBuilder() {
@Override
public void configure() throws Exception {
from("direct:in")
.log("Charset name header (1):
${header.CamelCharsetName}")
.to("file://output.txt?charset=iso-8859-1")
.log("Charset name header (2):
${header.CamelCharsetName}")
.setHeader(Exchange.CHARSET_NAME,
constant("iso-8859-1"))
.log("Charset name header (3):
${header.CamelCharsetName}");
}
};
}
}
{code}
{code}
[ main] route1 INFO Charset
name header (1):
[ main] SendProcessor DEBUG >>>>
Endpoint[file://output.txt?charset=iso-8859-1] Exchange[Message: Hi there]
[ main] FileOperations DEBUG Using
Reader to write file: output.txt/ID-wheleph-Lenovo-G570-42931-1422203242220-0-1
with charset: iso-8859-1
[ main] GenericFileProducer DEBUG Wrote
[output.txt/ID-wheleph-Lenovo-G570-42931-1422203242220-0-1] to
[Endpoint[file://output.txt?charset=iso-8859-1]]
[ main] route1 INFO Charset
name header (2):
[ main] route1 INFO Charset
name header (3): iso-8859-1
{code}
> Charset is ignored for SFTP producer endpoints
> ----------------------------------------------
>
> Key: CAMEL-8191
> URL: https://issues.apache.org/jira/browse/CAMEL-8191
> Project: Camel
> Issue Type: Improvement
> Components: camel-ftp
> Affects Versions: 2.12.3, 2.14.1
> Environment: vso@vso-desktop:/tmp$ uname -a
> Linux vso-desktop 3.13.0-43-generic #72~precise1-Ubuntu SMP Tue Dec 9
> 12:14:18 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
> vso@vso-desktop:/tmp$ java -version
> java version "1.7.0_65"
> OpenJDK Runtime Environment (IcedTea 2.5.3) (7u71-2.5.3-0ubuntu0.12.04.1)
> OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)
> vso@vso-desktop:/tmp$ ssh -v
> OpenSSH_5.9p1 Debian-5ubuntu1.4, OpenSSL 1.0.1 14 Mar 2012
> Reporter: Volodymyr Sobotovych
> Labels: charset, sftp
> Fix For: 2.14.2, 2.15.0
>
> Attachments: CAMEL-8191.patch
>
>
> For SFTP producer endpoints option "charset" is ignored and the output file
> is created using platform-default charset (usually UTF-8).
> The simple Spring context illustrates the issue:
> {code}
> <beans xmlns="http://www.springframework.org/schema/beans"
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xsi:schemaLocation="
> http://www.springframework.org/schema/beans
> http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
> http://camel.apache.org/schema/spring
> http://camel.apache.org/schema/spring/camel-spring.xsd">
> <camelContext xmlns="http://camel.apache.org/schema/spring">
> <route>
> <from uri="stream:in?promptMessage=Enter something:" />
> <to
> uri="sftp://localhost:22/vso/sandbox?charset=ISO-8859-1&username=fake_sftp_user&password=qwerty"/>
> </route>
> </camelContext>
> </beans>
> {code}
> This context defines a route that transfers the string entered by user via
> SFTP. If the user enters "Müller", I can see 7-byte message in the output
> directory (because "ü" is represented using 2 bytes in UTF-8). While it
> should be 6-byte message if the file was encoded in ISO-8859-1.
> This problem affects only SFTP endpoints. File and FTP endpoints treat the
> "charset" option correctly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)