[ https://issues.apache.org/jira/browse/RAT-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14135430#comment-14135430 ]
Sebb commented on RAT-81: ------------------------- Also I'm not sure that the test files do have any errors in them > MalformedInputException thrown when RAT tries reading file > ---------------------------------------------------------- > > Key: RAT-81 > URL: https://issues.apache.org/jira/browse/RAT-81 > Project: Apache Rat > Issue Type: Bug > Components: engine > Affects Versions: 0.6, 0.7 > Environment: Linux (Ubuntu) on x86, running with "default" file > encoding set to UTF-8 > Reporter: Marshall Schor > Priority: Minor > Fix For: 0.8 > > > To reproduce, set the platform default locale to something that indicates > UTF-8 file encoding. > This causes code in (for example) org.apache.rat.document.impl.FileDocument > which return FileReader to set up RAT to use a reader which is using the > platform default character encoding (in this case UTF-8). > If the file being processed is not encoded in this , it is possible that the > reader will read some data which is "invalid" UTF-8 encodings, which causes > the reader to throw a MalformedInputException error. > One case we found: > The file being examined had invalid UTF-8 encodings. First, Rat ran the > BinaryGuesser - but that returned false because it attempted to read the > first 100 or so chars, and got a "MalformedInputException" instead, so the > try/catch block just ended up returning "false" (not binary). Then the > HeaderChecker tried to read the file to check the header, and got this same > exception - but this time, it made RAT fail. > Here's the last part of the stack trace: > Caused by: org.apache.rat.report.RatReportFailedException: Analysis failed > at org.apache.rat.report.xml.XmlReport.report(XmlReport.java:66) > at org.apache.rat.mp.FilesReportable.run(FilesReportable.java:69) > at org.apache.rat.Report.report(Report.java:292) > at org.apache.rat.Report.report(Report.java:272) > at > org.apache.rat.mp.AbstractRatMojo.createReport(AbstractRatMojo.java:341) > ... 23 more > Caused by: org.apache.rat.document.RatDocumentAnalysisException: Cannot > analyse header > at > org.apache.rat.report.analyser.DocumentHeaderAnalyser.analyse(DocumentHeaderAnalyser.java:54) > at > org.apache.rat.document.impl.util.DocumentAnalyserMultiplexer.analyse(DocumentAnalyserMultiplexer.java:37) > at > org.apache.rat.document.impl.util.ConditionalAnalyser.matches(ConditionalAnalyser.java:44) > at > org.apache.rat.document.impl.util.ConditionalAnalyser.analyse(ConditionalAnalyser.java:50) > at org.apache.rat.report.xml.XmlReport.report(XmlReport.java:64) > ... 27 more > Caused by: org.apache.rat.analysis.RatHeaderAnalysisException: Cannot read > header for > /home/tgoetz/tmp/uimaj-2.3.1/uimaj-core/src/test/resources/pearTests/encodingTests/UTF16_with_signature.xml > at > org.apache.rat.report.analyser.HeaderCheckWorker.read(HeaderCheckWorker.java:96) > at > org.apache.rat.report.analyser.DocumentHeaderAnalyser.analyse(DocumentHeaderAnalyser.java:50) > ... 31 more > Caused by: sun.io.MalformedInputException > at sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:294) > at > sun.nio.cs.StreamDecoder$ConverterSD.convertInto(StreamDecoder.java:316) > at sun.nio.cs.StreamDecoder$ConverterSD.implRead(StreamDecoder.java:366) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:252) > at java.io.InputStreamReader.read(InputStreamReader.java:212) > at java.io.BufferedReader.fill(BufferedReader.java:157) > at java.io.BufferedReader.readLine(BufferedReader.java:320) > at java.io.BufferedReader.readLine(BufferedReader.java:383) > at > org.apache.rat.report.analyser.HeaderCheckWorker.readLine(HeaderCheckWorker.java:111) > at > org.apache.rat.report.analyser.HeaderCheckWorker.read(HeaderCheckWorker.java:89) > ... 32 more > Work-around: mark these files for explicit exclusion. > Fix: change the binaryguesser to read the files in binary (not assuming any > character coding) and operate with that data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)