[ 
https://issues.apache.org/jira/browse/FLUME-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489544#comment-13489544
 ] 

Nitin Verma commented on FLUME-1676:
------------------------------------

Hi Mike,

InputStreamReader needs to know the charset else readLine just messes it up.

bufferedReader = new BufferedReader(new InputStreamReader(byteArrayInputStream, 
charset));
bufferedReader.readLine().getBytes(charset);


{code:java}
package edu.nitin.testcodes;

import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.nio.charset.Charset;
import org.testng.annotations.Test;

public class CharsetStreamTest {

    @Test
    public void testCharset() throws IOException {
        final byte[] bytes = new byte[]{
            (byte) 0x40, (byte) 0xC2, (byte) 0xE6, (byte) 0x40, (byte) '\n',
            (byte) 0x41, (byte) 0xC2, (byte) 0xE6, (byte) 0x40, (byte) '\n',
            (byte) 0x42, (byte) 0xC2, (byte) 0xE6, (byte) 0x40, (byte) '\n',
            (byte) 0x43, (byte) 0xC2, (byte) 0xE6, (byte) 0x40, (byte) '\n',
            (byte) 0x44, (byte) 0xC2, (byte) 0xE6, (byte) 0x40
        };

        final Charset charset = Charset.forName("ISO-8859-1");
        System.out.println("Input bytes");
        print(bytes);

        System.out.println("ingest using charset");
        {
            final ByteArrayInputStream byteArrayInputStream = new 
ByteArrayInputStream(bytes);

            final BufferedReader bufferedReader = new BufferedReader(
                    new InputStreamReader(byteArrayInputStream, charset));
            String line;
            while ((line = bufferedReader.readLine()) != null) {
                print(line.getBytes(charset));
            }
        }

        System.out.println("ingest without using charset");
        {
            final ByteArrayInputStream byteArrayInputStream = new 
ByteArrayInputStream(bytes);

            final BufferedReader bufferedReader = new BufferedReader(
                    new InputStreamReader(byteArrayInputStream));
            String line;
            while ((line = bufferedReader.readLine()) != null) {
                print(line.getBytes(charset));
            }
        }

    }

    private void print(final byte bytes[]) {
        for (byte b : bytes) {
            System.out.printf("  %02X", b);
        }
        System.out.println();
    }
}
{code}

{code}
Input bytes
  40  C2  E6  40  0A  41  C2  E6  40  0A  42  C2  E6  40  0A  43  C2  E6  40  
0A  44  C2  E6  40
ingest using charset
  40  C2  E6  40
  41  C2  E6  40
  42  C2  E6  40
  43  C2  E6  40
  44  C2  E6  40
ingest without using charset
  40  3F  3F  40
  41  3F  3F  40
  42  3F  3F  40
  43  3F  3F  40
  44  3F  3F
{code}
                
> ExecSource should provide a configurable charset
> ------------------------------------------------
>
>                 Key: FLUME-1676
>                 URL: https://issues.apache.org/jira/browse/FLUME-1676
>             Project: Flume
>          Issue Type: Bug
>         Environment: :~/apache-flume-1.4.0-SNAPSHOT/conf# ../bin/flume-ng 
> version
> Flume 1.4.0-SNAPSHOT
> Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
> Revision: 831a86fc5501a8624b184ea65e53749df31692b8
> Compiled by jenkins on Tue Oct 30 03:18:08 UTC 2012
> From source with checksum 98685e32b9e500a2305f538b4468faaa
>            Reporter: Suresh Saggar
>
> The character set is currently not configurable in the exec source - 
> http://flume.apache.org/FlumeUserGuide.html#exec-source
> File - 
> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java
> Can somebody please expose the ability to specify character set in the exec 
> source?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to