[
https://issues.apache.org/jira/browse/CAMEL-14521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Claus Ibsen reassigned CAMEL-14521:
-----------------------------------
Assignee: Claus Ibsen
> Unicode problem in Bindy component for fixed length data
> ---------------------------------------------------------
>
> Key: CAMEL-14521
> URL: https://issues.apache.org/jira/browse/CAMEL-14521
> Project: Camel
> Issue Type: Improvement
> Components: camel-bindy
> Environment: JDK: openjdk-8-jdk Version 8u242-b08-0ubuntu3~18.04 on
> Ubuntu 18.04 amd64
> The ICU4J library was used for processing Unicode correctly: See dependencies
> in POM
> Reporter: Michael Greulich
> Assignee: Claus Ibsen
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.1.0
>
>
>
> Hi,
> AFAIK all versions of came are affected by the following bug: Camel counts
> the chars in the fixed length data format wrongly.
> Unicode is a bit tricky, when it comes to counting the length of a string
> specially since Java uses internally UTF-16, which means depending on the
> codepoint 1 - 2 (Java-)chars. Bindy seems to use internally for selection
> substring and counts chars like Java does. This means the length of a string
> is the count of the chars, i.e. UTF-16 surrogates, but not codepoints, which
> is the common denominator (e.g. see definition of string length in
> XMLSchema). And when one takes combing chars into account (one "base char"
> plus 0 - n combining chars are perceived as one "char" by users) it becomes
> even more of a problem.
> Fixed length data format is totally dependent on counting chars correctly,
> which makes it unsuable if the chars are not correctly counted, since it
> cannot recover for "colums" to the right.
> See also the mailing list at
> [http://mail-archives.apache.org/mod_mbox/camel-users/202001.mbox/browser]
> As suggested I created a pull request, since this may be of some interest for
> the community. The ICU4J lib was used, for processing Unicode correctly,
> since the functionality built into the Java API is too old to process modern
> emojis (skin colour, hair, sex) correctly. Please watch the license...
> Pull-request: [https://github.com/apache/camel/pull/3552]
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)