[
https://issues.apache.org/jira/browse/JENA-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434810#comment-17434810
]
ASF subversion and git services commented on JENA-2186:
-------------------------------------------------------
Commit 54c69cce87e96fdc285f8747d61df4db18f21bba in jena's branch
refs/heads/main from Andy Seaborne
[ https://gitbox.apache.org/repos/asf?p=jena.git;h=54c69cc ]
JENA-2186: Nodec space allocation for FFFD
> Write U+FFFD as Unicode escape
> ------------------------------
>
> Key: JENA-2186
> URL: https://issues.apache.org/jira/browse/JENA-2186
> Project: Apache Jena
> Issue Type: Improvement
> Affects Versions: Jena 4.2.0
> Reporter: Andy Seaborne
> Assignee: Andy Seaborne
> Priority: Major
> Fix For: Jena 4.3.0
>
>
> U+FFFD (Unicode replacement character) arises when there is an encoding
> mismatch between the input bytes and UTF-8 (see the [wikipedia
> article|https://en.wikipedia.org/wiki/Specials_(Unicode_block)#Replacement_character]).
> The tokenizer for Turtle/N-Triple etc raises a warning when a literal U+FFFD
> is encountered to notify users/applications of potential problems.
> The tokenizer does not warn if it is written intentionally in the input
> stream as {{\uFFFD}} (6 characters).
> The write should this unicode escape form so charcater FFFD is written and
> read in again without warning.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)