[ 
https://issues.apache.org/jira/browse/RAT-81?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebb reopened RAT-81:
---------------------
      Assignee:     (was: Stefan Bodewig)

It does not seem right to me to mark XML files with invalid contents as binary.
Binary implies that the file does not need a license, but that is not the case.

Such files should still have a valid license (unless excluded), so RAT should 
report the file as unreadable or similar.

> MalformedInputException thrown when RAT tries reading file
> ----------------------------------------------------------
>
>                 Key: RAT-81
>                 URL: https://issues.apache.org/jira/browse/RAT-81
>             Project: Apache Rat
>          Issue Type: Bug
>          Components: engine
>    Affects Versions: 0.6, 0.7
>         Environment: Linux (Ubuntu) on x86, running with "default" file 
> encoding set to UTF-8
>            Reporter: Marshall Schor
>            Priority: Minor
>             Fix For: 0.8
>
>
> To reproduce, set the platform default locale to something that indicates 
> UTF-8 file encoding.
> This causes code in (for example) org.apache.rat.document.impl.FileDocument 
> which return FileReader to set up RAT to use a reader which is using the 
> platform default character encoding (in this case UTF-8).
> If the file being processed is not encoded in this , it is possible that the 
> reader will read some data which is "invalid" UTF-8 encodings, which causes 
> the reader to throw a MalformedInputException error.
> One case we found:
> The file being examined had invalid UTF-8 encodings.  First, Rat ran the 
> BinaryGuesser - but that returned false because it attempted to read the 
> first 100 or so chars, and got a "MalformedInputException" instead, so the 
> try/catch block just ended up returning "false" (not binary).  Then the 
> HeaderChecker tried to read the file to check the header, and got this same 
> exception - but this time, it made RAT fail.
> Here's the last part of the stack trace:
> Caused by: org.apache.rat.report.RatReportFailedException: Analysis failed
>     at org.apache.rat.report.xml.XmlReport.report(XmlReport.java:66)
>     at org.apache.rat.mp.FilesReportable.run(FilesReportable.java:69)
>     at org.apache.rat.Report.report(Report.java:292)
>     at org.apache.rat.Report.report(Report.java:272)
>     at 
> org.apache.rat.mp.AbstractRatMojo.createReport(AbstractRatMojo.java:341)
>     ... 23 more
> Caused by: org.apache.rat.document.RatDocumentAnalysisException: Cannot 
> analyse header
>     at 
> org.apache.rat.report.analyser.DocumentHeaderAnalyser.analyse(DocumentHeaderAnalyser.java:54)
>     at 
> org.apache.rat.document.impl.util.DocumentAnalyserMultiplexer.analyse(DocumentAnalyserMultiplexer.java:37)
>     at 
> org.apache.rat.document.impl.util.ConditionalAnalyser.matches(ConditionalAnalyser.java:44)
>     at 
> org.apache.rat.document.impl.util.ConditionalAnalyser.analyse(ConditionalAnalyser.java:50)
>     at org.apache.rat.report.xml.XmlReport.report(XmlReport.java:64)
>     ... 27 more
> Caused by: org.apache.rat.analysis.RatHeaderAnalysisException: Cannot read 
> header for 
> /home/tgoetz/tmp/uimaj-2.3.1/uimaj-core/src/test/resources/pearTests/encodingTests/UTF16_with_signature.xml
>     at 
> org.apache.rat.report.analyser.HeaderCheckWorker.read(HeaderCheckWorker.java:96)
>     at 
> org.apache.rat.report.analyser.DocumentHeaderAnalyser.analyse(DocumentHeaderAnalyser.java:50)
>     ... 31 more
> Caused by: sun.io.MalformedInputException
>     at sun.io.ByteToCharUTF8.convert(ByteToCharUTF8.java:294)
>     at 
> sun.nio.cs.StreamDecoder$ConverterSD.convertInto(StreamDecoder.java:316)
>     at sun.nio.cs.StreamDecoder$ConverterSD.implRead(StreamDecoder.java:366)
>     at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:252)
>     at java.io.InputStreamReader.read(InputStreamReader.java:212)
>     at java.io.BufferedReader.fill(BufferedReader.java:157)
>     at java.io.BufferedReader.readLine(BufferedReader.java:320)
>     at java.io.BufferedReader.readLine(BufferedReader.java:383)
>     at 
> org.apache.rat.report.analyser.HeaderCheckWorker.readLine(HeaderCheckWorker.java:111)
>     at 
> org.apache.rat.report.analyser.HeaderCheckWorker.read(HeaderCheckWorker.java:89)
>     ... 32 more 
> Work-around: mark these files for explicit exclusion.
> Fix: change the binaryguesser to read the files in binary (not assuming any 
> character coding) and operate with that data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to