[
https://issues.apache.org/jira/browse/HDFS-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Colin Patrick McCabe updated HDFS-4235:
---------------------------------------
Attachment: HDFS-4235.001.patch
here's a patch which contains unit tests which demonstrate the problem.
Basically, the issue is that certain code points are illegal in XML, but not in
the file names we accept in HDFS. This patch squares that circle by using name
mangling on such paths.
> when outputting XML, OfflineEditsViewer can't handle some edits containing
> non-ASCII strings
> --------------------------------------------------------------------------------------------
>
> Key: HDFS-4235
> URL: https://issues.apache.org/jira/browse/HDFS-4235
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Colin Patrick McCabe
> Priority: Minor
> Attachments: HDFS-4235.001.patch
>
>
> It seems that when outputting XML, OfflineEditsViewer can't handle some edits
> containing non-ASCII strings.
> Example:
> {code}
> cmccabe@keter:/h> ./bin/hdfs oev -i ~/Downloads/current2/edits -o /tmp/u.xml
>
> 17:11:24,662 ERROR OfflineEditsBinaryLoader:82 - Got IOException at position
> 10593
> Encountered exception. Exiting: SAX error: The character '�' is an invalid
> XML character
> java.io.IOException: SAX error: The character '�' is an invalid XML character
> at
> org.apache.hadoop.hdfs.tools.offlineEditsViewer.XmlEditsVisitor.visitOp(XmlEditsVisitor.java:119)
> at
> org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsBinaryLoader.loadEdits(OfflineEditsBinaryLoader.java:78)
> at
> org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.go(OfflineEditsViewer.java:142)
> at
> org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.run(OfflineEditsViewer.java:228)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at
> org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer.main(OfflineEditsViewer.java:237)
> {code}
> Probably, we forgot to properly escape and/or re-encode a filename before
> putting it into the XML. The other processors (stats, binary) don't have
> this problem, so it is purely an XML encoding issue.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira