[
https://issues.apache.org/jira/browse/CAMEL-8905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14605749#comment-14605749
]
Franz Forsthofer commented on CAMEL-8905:
-----------------------------------------
The IP clearance is fine. The code is just an implementation of the encoding
part of spec RFC 4627.
But I am wondering if it is really a good idea to restrict the encoding of JSON
documents to unicode as specified in RFC 4627. There is another specification
of JSON:
http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf. In
this spec no encoding is mentioned.
Instead of automatically determining the charset from the Json document, one
could get the endcoding via the method
org.apache.camel.util.IOHelper.getCharSetName(Exchange)which determines the
encoding from the message header Exchange.CHARSET_NAME or the exchange property
Exchange.CHARSET_NAME. But with this solution the header or property must be
set somehow.
What is your opinion on this?
> encoding problems in jsonpath
> -----------------------------
>
> Key: CAMEL-8905
> URL: https://issues.apache.org/jira/browse/CAMEL-8905
> Project: Camel
> Issue Type: Bug
> Components: camel-jsonpath
> Affects Versions: 2.15.2
> Reporter: Franz Forsthofer
> Fix For: 2.16.0, 2.15.3
>
> Attachments: 0001-jasonpath-automatic-encoding-detection.patch,
> booksUTF16BE.json, booksUTF16LE.json, jsonUCS2BigEndianWithBOM.txt,
> jsonUCS2BigEndianWithoutBOM.txt, jsonUCS2LittleEndianWithBom.txt,
> jsonUCS2LittleEndianWithoutBOM.txt, jsonUTF32BEWithBOM.txt,
> jsonUTF32BEWithoutBOM.txt, jsonUTF32LEWithBOM.txt, jsonUTF32LEWithoutBOM.txt
>
>
> I detected three different encoding problems in jsonpath:
> - if jsonpath is called with an input stream which has an encoding different
> from the default encoding (given by Charset.defaultCharset()) then jsonpath
> still uses the default encoding. Error location in JsonPathEngine:
> else if (json instanceof InputStream) {
> InputStream is = (InputStream) json;
> return path.read(is, Charset.defaultCharset().displayName(),
> configuration);}
>
> - if jsonpath is called with a json file whose encoding is different from
> UTF-8, then jsonpath still parses the document with UTF-8. Error location in
> JsonPathEngine:
> else if (json instanceof File) {
> File file = (File) json;
> return path.read(file, configuration);
> }
> path.read(file, configuration) uses always UTF-8
> - if jsonpath is called with an URL pointing to a JSON document whose
> encoding is different from UTF-8, then jsonPath still parses the document
> with UTF-8. Error location in JsonPathEngine:
> else if (json instanceof URL) {
> URL url = (URL) json;
> return path.read(url, configuration);
> }
> path.read(url, configuration) uses UTF-8
> My solution proposal is to determine the encoding of the JSON documents
> automatically according to the specification RFC-4627
> (https://www.ietf.org/rfc/rfc4627.txt; see chapter 3. Encoding) and then call
> the method path.read(jsonDocument,foundEncoding,configuration) with the found
> encoding. See attached patch.
> Actually I can commit the patch myself. However, I would like that somebody
> who is more familiar with jsonpath than I does review my patch.
> So please tell me if my patch can be accepted or not. I can then do the
> actual commit or I will discard the patch.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)