[
https://issues.apache.org/jira/browse/CAMEL-11846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16245555#comment-16245555
]
Robert Half edited comment on CAMEL-11846 at 11/9/17 12:23 PM:
---------------------------------------------------------------
Hi Viral,
I have a workaround first: I use BufferedInputStream wrapper, so I am able to
reset it later (don't need to open the file twice). I give the InputStream to
XmlStreamReader, which gives me the encoding after reading XML file prolog.
Then I set it for camel on the Exchange.CHARSET_NAME header:
{code:java}
EncodingUtil.DetectedEncodingStream detectedEncodingStream =
EncodingUtil.detectEncoding(inputStream, new StaxConverter().getInputFactory());
inputStream = detectedEncodingStream.inputStream;
exchange.getIn().setHeader(Exchange.CHARSET_NAME,
detectedEncodingStream.encoding);
{code}
{code:java}
public class EncodingUtil {
public static class DetectedEncodingStream {
public InputStream inputStream;
public String encoding;
public DetectedEncodingStream(InputStream inputStream, String encoding)
{
this.inputStream = inputStream;
this.encoding = encoding;
}
}
private static final int MAX_REWINDABLE_STREAM_BUFFER = 2*4196;
public static final Logger LOGGER =
LoggerFactory.getLogger(EncodingUtil.class);
public static DetectedEncodingStream detectEncoding(InputStream
inputStream, XMLInputFactory xmlInputFactory) {
final BufferedInputStream bufferedInputStream = new
BufferedInputStream(inputStream, MAX_REWINDABLE_STREAM_BUFFER);
bufferedInputStream.mark(MAX_REWINDABLE_STREAM_BUFFER);
String encoding;
XMLStreamReader xmlStreamReader = null;
try {
xmlStreamReader =
xmlInputFactory.createXMLStreamReader(bufferedInputStream);
} catch (XMLStreamException e) {
throw new RuntimeException(e);
} finally {
try {
bufferedInputStream.reset();
} catch (IOException e) {
throw new RuntimeException(e);
} finally {
try {
xmlStreamReader.close();
} catch (XMLStreamException e) {
throw new RuntimeException("Failed to close
XmlStreamRader", e);
}
}
}
encoding = xmlStreamReader.getCharacterEncodingScheme();
if (encoding == null) {
encoding = StandardCharsets.UTF_8.name();
}
return new DetectedEncodingStream(bufferedInputStream, encoding);
}
}
{code}
was (Author: antidote2):
Hi Viral,
I have a workaround first: I use BufferedInputStream wrapper, so I am able to
reset it later (don't need to open the file twice). I give the InputStream to
XmlStreamReader, which gives me the encoding after reading XML file prolog.
Then I set it for camel on the Exchange.CHARSET_NAME header:
EncodingUtil.DetectedEncodingStream detectedEncodingStream =
EncodingUtil.detectEncoding(inputStream, new StaxConverter().getInputFactory());
inputStream = detectedEncodingStream.inputStream;
exchange.getIn().setHeader(Exchange.CHARSET_NAME,
detectedEncodingStream.encoding);
{code:java}
public class EncodingUtil {
public static class DetectedEncodingStream {
public InputStream inputStream;
public String encoding;
public DetectedEncodingStream(InputStream inputStream, String encoding)
{
this.inputStream = inputStream;
this.encoding = encoding;
}
}
private static final int MAX_REWINDABLE_STREAM_BUFFER = 2*4196;
public static final Logger LOGGER =
LoggerFactory.getLogger(EncodingUtil.class);
public static DetectedEncodingStream detectEncoding(InputStream
inputStream, XMLInputFactory xmlInputFactory) {
final BufferedInputStream bufferedInputStream = new
BufferedInputStream(inputStream, MAX_REWINDABLE_STREAM_BUFFER);
bufferedInputStream.mark(MAX_REWINDABLE_STREAM_BUFFER);
String encoding;
XMLStreamReader xmlStreamReader = null;
try {
xmlStreamReader =
xmlInputFactory.createXMLStreamReader(bufferedInputStream);
} catch (XMLStreamException e) {
throw new RuntimeException(e);
} finally {
try {
bufferedInputStream.reset();
} catch (IOException e) {
throw new RuntimeException(e);
} finally {
try {
xmlStreamReader.close();
} catch (XMLStreamException e) {
throw new RuntimeException("Failed to close
XmlStreamRader", e);
}
}
}
encoding = xmlStreamReader.getCharacterEncodingScheme();
if (encoding == null) {
encoding = StandardCharsets.UTF_8.name();
}
return new DetectedEncodingStream(bufferedInputStream, encoding);
}
}
{code}
> xtokenize and apply xslt to a string does not work with UTF-16BE
> -----------------------------------------------------------------
>
> Key: CAMEL-11846
> URL: https://issues.apache.org/jira/browse/CAMEL-11846
> Project: Camel
> Issue Type: Bug
> Components: camel-core
> Affects Versions: 2.17.5
> Reporter: Robert Half
>
> In XML, encoding is often provided inside <?xml ..?> tag. In general, you
> cannot read the tag, if you don't know the encoding, but XML Parsers support
> the detection of several encodings which allows them to read the tag. With
> that information they can read the whole file without knowing the "charset"
> in first place.
> xtokenize and xslt use XmlInputFactory#createXmlStreamReader(Reader). But by
> providing a reader Camel tells, that it knows the encoding, so it won't be
> detected by the XML parser.
> Also Camel sets the charset to UTF-8 if it is not provided inside a header.
> This makes the underlying reader fail reading UTF-16.
> Using XmlInputFactory#createXmlStreamReader(InputStream) inside
> XMLTokenExpressionIterator works (tried in a patch). But the next xslt steps
> fails again because it again uses a Reader.
> See Stackoverflow Question for reference:
> [https://stackoverflow.com/questions/46322376/apache-camel-to-handle-encoding-declared-in-xml-file]
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)