[
https://issues.apache.org/jira/browse/FLUME-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489504#comment-13489504
]
Nitin Verma commented on FLUME-1676:
------------------------------------
Hi Mike,
I did some testing on constructing java strings using iso-8859-1 bytes. As java
string translates from given bytes to UTF-16, if charset is not correct then it
is lossy. (default is UTF-8)
For flume we should ingest and egest bytes from strings using the charset so
that channel get the same bytes as user source had, likewise the sink.
string = new String(bytes, charset);
string.getBytes(charset);
TODO: I would do similar tests on streams.
Java Test Code
{code:java}
package edu.nitin.testcodes;
import java.nio.charset.Charset;
import org.testng.annotations.Test;
public class CharsetTest {
@Test
public void testCharset() {
final byte[] bytes = new byte[]{(byte) 0x40, (byte) 0xC2, (byte)
0xE6,(byte) 0x40};
final Charset charset = Charset.forName("ISO-8859-1");
System.out.println("Input bytes");
print(bytes);
System.out.println("ingest using charset");
{
final String string = new String(bytes, charset);
System.out.println(string);
print(string.getBytes());
print(string.getBytes(charset));
}
System.out.println("ingest without using charset");
{
final String string = new String(bytes);
System.out.println(string);
print(string.getBytes());
print(string.getBytes(charset));
}
}
private void print(final byte bytes[]) {
for (byte b : bytes) {
System.out.printf(" %02X", b);
}
System.out.println();
}
}
{code}
Output
{code}
Input bytes
40 C2 E6 40
ingest using charset
@Âæ@
40 C3 82 C3 A6 40
40 C2 E6 40
ingest without using charset
@��
40 EF BF BD EF BF BD
40 3F 3F
{code}
> ExecSource should provide a configurable charset
> ------------------------------------------------
>
> Key: FLUME-1676
> URL: https://issues.apache.org/jira/browse/FLUME-1676
> Project: Flume
> Issue Type: Bug
> Environment: :~/apache-flume-1.4.0-SNAPSHOT/conf# ../bin/flume-ng
> version
> Flume 1.4.0-SNAPSHOT
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 831a86fc5501a8624b184ea65e53749df31692b8
> Compiled by jenkins on Tue Oct 30 03:18:08 UTC 2012
> From source with checksum 98685e32b9e500a2305f538b4468faaa
> Reporter: Suresh Saggar
>
> The character set is currently not configurable in the exec source -
> http://flume.apache.org/FlumeUserGuide.html#exec-source
> File -
> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java
> Can somebody please expose the ability to specify character set in the exec
> source?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira