[
https://issues.apache.org/jira/browse/FLUME-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489544#comment-13489544
]
Nitin Verma commented on FLUME-1676:
------------------------------------
Hi Mike,
InputStreamReader needs to know the charset else readLine just messes it up.
bufferedReader = new BufferedReader(new InputStreamReader(byteArrayInputStream,
charset));
bufferedReader.readLine().getBytes(charset);
{code:java}
package edu.nitin.testcodes;
import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.nio.charset.Charset;
import org.testng.annotations.Test;
public class CharsetStreamTest {
@Test
public void testCharset() throws IOException {
final byte[] bytes = new byte[]{
(byte) 0x40, (byte) 0xC2, (byte) 0xE6, (byte) 0x40, (byte) '\n',
(byte) 0x41, (byte) 0xC2, (byte) 0xE6, (byte) 0x40, (byte) '\n',
(byte) 0x42, (byte) 0xC2, (byte) 0xE6, (byte) 0x40, (byte) '\n',
(byte) 0x43, (byte) 0xC2, (byte) 0xE6, (byte) 0x40, (byte) '\n',
(byte) 0x44, (byte) 0xC2, (byte) 0xE6, (byte) 0x40
};
final Charset charset = Charset.forName("ISO-8859-1");
System.out.println("Input bytes");
print(bytes);
System.out.println("ingest using charset");
{
final ByteArrayInputStream byteArrayInputStream = new
ByteArrayInputStream(bytes);
final BufferedReader bufferedReader = new BufferedReader(
new InputStreamReader(byteArrayInputStream, charset));
String line;
while ((line = bufferedReader.readLine()) != null) {
print(line.getBytes(charset));
}
}
System.out.println("ingest without using charset");
{
final ByteArrayInputStream byteArrayInputStream = new
ByteArrayInputStream(bytes);
final BufferedReader bufferedReader = new BufferedReader(
new InputStreamReader(byteArrayInputStream));
String line;
while ((line = bufferedReader.readLine()) != null) {
print(line.getBytes(charset));
}
}
}
private void print(final byte bytes[]) {
for (byte b : bytes) {
System.out.printf(" %02X", b);
}
System.out.println();
}
}
{code}
{code}
Input bytes
40 C2 E6 40 0A 41 C2 E6 40 0A 42 C2 E6 40 0A 43 C2 E6 40
0A 44 C2 E6 40
ingest using charset
40 C2 E6 40
41 C2 E6 40
42 C2 E6 40
43 C2 E6 40
44 C2 E6 40
ingest without using charset
40 3F 3F 40
41 3F 3F 40
42 3F 3F 40
43 3F 3F 40
44 3F 3F
{code}
> ExecSource should provide a configurable charset
> ------------------------------------------------
>
> Key: FLUME-1676
> URL: https://issues.apache.org/jira/browse/FLUME-1676
> Project: Flume
> Issue Type: Bug
> Environment: :~/apache-flume-1.4.0-SNAPSHOT/conf# ../bin/flume-ng
> version
> Flume 1.4.0-SNAPSHOT
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 831a86fc5501a8624b184ea65e53749df31692b8
> Compiled by jenkins on Tue Oct 30 03:18:08 UTC 2012
> From source with checksum 98685e32b9e500a2305f538b4468faaa
> Reporter: Suresh Saggar
>
> The character set is currently not configurable in the exec source -
> http://flume.apache.org/FlumeUserGuide.html#exec-source
> File -
> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java
> Can somebody please expose the ability to specify character set in the exec
> source?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira