[
https://issues.apache.org/jira/browse/FLUME-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819929#comment-13819929
]
Alexandre Dutra edited comment on FLUME-2215 at 11/12/13 2:11 PM:
------------------------------------------------------------------
I brought this bug to the dev list a few days ago, see
[here|http://mail-archives.apache.org/mod_mbox/flume-dev/201311.mbox/%3CCAHe0eKa7WNUi0o0WKq_xdqG3pbCbY3ZWp3uSW977sJxG0U277A%40mail.gmail.com%3E].
I'm attaching a patch that solves the problem by decoding the surrogate pair in
a single pass.
The only drawback I could think of would be if the agent crashes while the
reader is in the middle of a surrogate pair and an attempt is made to resume
reading from last recorded position. Should this case happen, the low surrogate
would be lost and the consumer would end up with a corrupted char stream (high
surrogate alone).
was (Author: adutra):
I brought this bug to the dev list a few days ago, see
[here|http://mail-archives.apache.org/mod_mbox/flume-dev/201311.mbox/%3CCAHe0eKa7WNUi0o0WKq_xdqG3pbCbY3ZWp3uSW977sJxG0U277A%40mail.gmail.com%3E].
I'm attaching a patch that solves the problem by decoding the surrogate pair in
a single pass.
The only drawback I could think of would be if the agent crashes while the
reader is the middle of a surrogate pair and an attempt is made to resume
reading from last recorded position. Should this case happen, the low surrogate
would be lost and the consumer would end up with a corrupted char stream (high
surrogate alone).
> ResettableFileInputStream can't support ucs-4 character
> --------------------------------------------------------
>
> Key: FLUME-2215
> URL: https://issues.apache.org/jira/browse/FLUME-2215
> Project: Flume
> Issue Type: Bug
> Affects Versions: v1.5.0
> Reporter: syntony liu
> Priority: Critical
> Attachments: FLUME-2215-0.patch
>
>
> ResettableFileInputStream.java:readChar() not handle ucs-4 character. it need
> 2 charBuf. it cause an unexpected termination。
> a temporary solution:
> if (res.isOverflow() && !charBuf.hasRemaining()){
> logger.warn("decoder ucs-4 at postion: {}" , buf.position());
> tmpBuf.clear();
> res = decoder.decode(buf, tmpBuf, isEndOfInput);
> incrPosition( buf.position() - start, false);
> return '?';
> }
--
This message was sent by Atlassian JIRA
(v6.1#6144)