[
https://issues.apache.org/jira/browse/ANY23-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281437#comment-13281437
]
Andy Seaborne commented on ANY23-99:
------------------------------------
For parsing, a single parser can handle the ASCII and UTF-8 versions if it
handles UTF-8 (ASCII is a strict subset of UTF-8). My experience is that
N-triples does occur with non-ASCII in it because the ASCII restriction isn't
universally known. N-Quads is probably the same.
OutputStreamWriter(OutputStream,ASCII) does have one consequence - should for
some reason non-ASCII be fed into such a writer, the output is wrong (default
is to print a "?" i.e. silent corruption of the data). By the time the writer
sees non-ASCII in the stream it's too late.
So changing the OutputStreamWriter is fine - relying on the platform default is
never good. But if there is a problem, it's in the code sending the data to
the writer, and the writer can't fix it up (unless we have a writer that
generates the \u itself).
>From a code inspection, the comments should be fixed. A strict fix is to
>encode them, a lax fix is to output UTF-8 and leave comments as written for
>convenience of reading them again.
The most common use of N-Quads I see is as DB dumps - no comments.
> NQuadsWriter should force ASCII in OutputStream constructor
> -----------------------------------------------------------
>
> Key: ANY23-99
> URL: https://issues.apache.org/jira/browse/ANY23-99
> Project: Apache Any23
> Issue Type: Bug
> Components: core
> Affects Versions: 0.8.0
> Reporter: Peter Ansell
>
> The NQuads specification states that all NQuads documents must be ASCII
> encoded. [1] The current NQuadsWriter(OutputStream) constructor does not
> enforce this when creating the OutputStreamWriter to wrap up the given
> outputstream. If it is not enforced, then the users locale will be used to
> create the OutputStreamWriter, which may not enforce US-ASCII.
> Patch is to replace the constructor with:
> this( new OutputStreamWriter(os, Charset.forName("US-ASCII")) );
> [1] http://sw.deri.org/2008/07/n-quads/#mediatype
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira