[ https://issues.apache.org/jira/browse/TIKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095726#comment-14095726 ]
Hudson commented on TIKA-1387: ------------------------------ SUCCESS: Integrated in tika-trunk-jdk1.6 #136 (See [https://builds.apache.org/job/tika-trunk-jdk1.6/136/]) For places formatting numbers in fixed formats, or case-insensitive comparing Ascii strings, use Locale.ROOT not Locale.getDefault() to ensure predictable behaviour, and avoid issues in locales like Turkish. TIKA-1387 (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1617765) * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/iwork/AutoPageNumberUtils.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDF2XHTML.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParser.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParserConfig.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/embedder/ExternalEmbedderTest.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/image/ImageMetadataExtractorTest.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java * /tika/trunk/tika-server/src/main/java/org/apache/tika/server/TikaResource.java * /tika/trunk/tika-server/src/main/java/org/apache/tika/server/UnpackerResource.java For places formatting numbers in fixed formats, or case-insensitive comparing Ascii strings, use Locale.ROOT not Locale.getDefault() to ensure predictable behaviour, and avoid issues in locales like Turkish. TIKA-1387 (nick: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1617758) * /tika/trunk/tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java * /tika/trunk/tika-core/src/main/java/org/apache/tika/detect/MagicDetector.java * /tika/trunk/tika-core/src/main/java/org/apache/tika/io/FilenameUtils.java * /tika/trunk/tika-core/src/main/java/org/apache/tika/utils/DateUtils.java * /tika/trunk/tika-core/src/test/java/org/apache/tika/TypeDetectionBenchmark.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/html/BoilerpipeContentHandler.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/image/ImageMetadataExtractor.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/iptc/IptcAnpaParser.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mbox/MboxParser.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OutlookExtractor.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/odf/NSNormalizerContentHandler.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pkg/ZipContainerDetector.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/rtf/RTFObjDataParser.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/rtf/TextExtractor.java > Add forbidden-apis checker to TIKA build > ---------------------------------------- > > Key: TIKA-1387 > URL: https://issues.apache.org/jira/browse/TIKA-1387 > Project: Tika > Issue Type: Improvement > Components: general > Reporter: Uwe Schindler > Assignee: Tyler Palsulich > Fix For: 1.7 > > Attachments: TIKA-1387.palsulich.080614.patch, TIKA-1387.patch, > TIKA-1387.patch, TIKA-1387.patch > > > Lucene and many other projects already use the forbidden-apis checker to > prevent use of some broken classes/signatures from the JDK. These are > especially thing using default character sets or default locales. The > forbidden-api checker can also be used to explcitely disallow specific > methods, if they have security issues (e.g., creating XML parsers without > disabling external entity support). > The attached patch adds the forbidden-api checker to the tika-parent pom file > with default configuration. > Running it fails with many errors in TIKA core already: > {noformat} > [INFO] --- forbiddenapis:1.6.1:check (default) @ tika-core --- > [INFO] Scanning for classes to check... > [INFO] Reading bundled API signatures: jdk-unsafe > [INFO] Reading bundled API signatures: jdk-deprecated > [INFO] Loading classes to check... > [INFO] Scanning for API signatures and dependencies... > [ERROR] Forbidden method invocation: java.lang.String#getBytes() [Uses > default charset] > [ERROR] in org.apache.tika.language.LanguageProfilerBuilder > (LanguageProfilerBuilder.java:407) > [ERROR] Forbidden method invocation: java.lang.String#toUpperCase() [Uses > default locale] > [ERROR] in org.apache.tika.io.FilenameUtils (FilenameUtils.java:68) > [ERROR] Forbidden method invocation: java.lang.String#getBytes() [Uses > default charset] > [ERROR] in org.apache.tika.io.IOUtils (IOUtils.java:257) > [ERROR] Forbidden method invocation: java.lang.String#<init>(byte[]) [Uses > default charset] > [ERROR] in org.apache.tika.io.IOUtils (IOUtils.java:395) > [ERROR] Forbidden method invocation: java.lang.String#<init>(byte[]) [Uses > default charset] > [ERROR] in org.apache.tika.io.IOUtils (IOUtils.java:416) > [ERROR] Forbidden method invocation: > java.io.InputStreamReader#<init>(java.io.InputStream) [Uses default charset] > [ERROR] in org.apache.tika.io.IOUtils (IOUtils.java:438) > [ERROR] Forbidden method invocation: java.lang.String#getBytes() [Uses > default charset] > [ERROR] in org.apache.tika.io.IOUtils (IOUtils.java:532) > [ERROR] Forbidden method invocation: java.lang.String#getBytes() [Uses > default charset] > [ERROR] in org.apache.tika.io.IOUtils (IOUtils.java:550) > [ERROR] Forbidden method invocation: java.lang.String#<init>(byte[]) [Uses > default charset] > [ERROR] in org.apache.tika.io.IOUtils (IOUtils.java:588) > [ERROR] Forbidden method invocation: java.lang.String#getBytes() [Uses > default charset] > [ERROR] in org.apache.tika.io.IOUtils (IOUtils.java:656) > [ERROR] Forbidden method invocation: java.lang.String#getBytes() [Uses > default charset] > [ERROR] in org.apache.tika.io.IOUtils (IOUtils.java:782) > [ERROR] Forbidden method invocation: java.lang.String#getBytes() [Uses > default charset] > [ERROR] in org.apache.tika.io.IOUtils (IOUtils.java:851) > [ERROR] Forbidden method invocation: > java.io.InputStreamReader#<init>(java.io.InputStream) [Uses default charset] > [ERROR] in org.apache.tika.io.IOUtils (IOUtils.java:957) > [ERROR] Forbidden method invocation: > java.io.OutputStreamWriter#<init>(java.io.OutputStream) [Uses default charset] > [ERROR] in org.apache.tika.io.IOUtils (IOUtils.java:1064) > [ERROR] Forbidden method invocation: > java.io.OutputStreamWriter#<init>(java.io.OutputStream) [Uses default charset] > [ERROR] in org.apache.tika.sax.WriteOutContentHandler > (WriteOutContentHandler.java:93) > [ERROR] Forbidden method invocation: > java.io.InputStreamReader#<init>(java.io.InputStream) [Uses default charset] > [ERROR] in org.apache.tika.parser.external.ExternalParser > (ExternalParser.java:234) > [ERROR] Forbidden method invocation: > java.io.InputStreamReader#<init>(java.io.InputStream) [Uses default charset] > [ERROR] in org.apache.tika.parser.external.ExternalParser$3 > (ExternalParser.java:294) > [ERROR] Forbidden method invocation: > java.util.Calendar#getInstance(java.util.Locale) [Uses default locale or time > zone] > [ERROR] in org.apache.tika.utils.DateUtils (DateUtils.java:83) > [ERROR] Forbidden method invocation: > java.lang.String#format(java.lang.String,java.lang.Object[]) [Uses default > locale] > [ERROR] in org.apache.tika.utils.DateUtils (DateUtils.java:91) > [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses > default locale] > [ERROR] in org.apache.tika.detect.MagicDetector (MagicDetector.java:98) > [ERROR] Forbidden method invocation: java.lang.String#getBytes() [Uses > default charset] > [ERROR] in org.apache.tika.detect.MagicDetector (MagicDetector.java:100) > [ERROR] Forbidden method invocation: java.lang.String#<init>(byte[]) [Uses > default charset] > [ERROR] in org.apache.tika.detect.MagicDetector (MagicDetector.java:396) > [ERROR] Forbidden method invocation: > java.io.OutputStreamWriter#<init>(java.io.OutputStream) [Uses default charset] > [ERROR] in org.apache.tika.sax.ToTextContentHandler > (ToTextContentHandler.java:60) > [ERROR] Scanned 225 (and 356 related) class file(s) for forbidden API > invocations (in 0.42s), 23 error(s). > {noformat} > We should fix those problems. -- This message was sent by Atlassian JIRA (v6.2#6252)