[ 
https://issues.apache.org/jira/browse/JENA-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17434811#comment-17434811
 ] 

ASF subversion and git services commented on JENA-2186:
-------------------------------------------------------

Commit cffd3cec2c249475f8e8bb3ac2018c56fe82c9fd in jena's branch 
refs/heads/main from Andy Seaborne
[ https://gitbox.apache.org/repos/asf?p=jena.git;h=cffd3ce ]

Merge pull request #1093 from afs/fix-TDB1-FFFD

JENA-2186: Nodec space allocation for FFFD

> Write U+FFFD as Unicode escape
> ------------------------------
>
>                 Key: JENA-2186
>                 URL: https://issues.apache.org/jira/browse/JENA-2186
>             Project: Apache Jena
>          Issue Type: Improvement
>    Affects Versions: Jena 4.2.0
>            Reporter: Andy Seaborne
>            Assignee: Andy Seaborne
>            Priority: Major
>             Fix For: Jena 4.3.0
>
>
> U+FFFD (Unicode replacement character) arises when there is an encoding 
> mismatch between the input bytes and UTF-8 (see the [wikipedia 
> article|https://en.wikipedia.org/wiki/Specials_(Unicode_block)#Replacement_character]).
> The tokenizer for Turtle/N-Triple etc raises a warning when a literal U+FFFD 
> is encountered to notify users/applications of potential problems.
> The tokenizer does not warn if it is written intentionally in the input 
> stream as {{\uFFFD}} (6 characters).
> The write should this unicode escape form so charcater FFFD is written and 
> read in again without warning.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to