[ https://issues.apache.org/jira/browse/TIKA-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816780#comment-17816780 ]
Hudson commented on TIKA-4195: ------------------------------ SUCCESS: Integrated in Jenkins build Tika ยป tika-main-jdk11 #1504 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1504/]) TIKA-4195 -- jsoup parser shouldn't conceal backoff to default encoding (#1591) (github: [https://github.com/apache/tika/commit/455409bf80801152e7c855ddc994fedc32c4cfcf]) * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-text-module/src/test/java/org/apache/tika/parser/txt/TXTParserTest.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-html-module/src/test/java/org/apache/tika/parser/html/HtmlParserTest.java * (edit) tika-core/src/main/java/org/apache/tika/detect/AutoDetectReader.java * (edit) tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/parser/RecursiveParserWrapperTest.java * (edit) tika-core/src/main/java/org/apache/tika/metadata/TikaCoreProperties.java * (edit) tika-core/src/main/java/org/apache/tika/detect/CompositeEncodingDetector.java > JSoupParser conceals null from the EncodingDetector > --------------------------------------------------- > > Key: TIKA-4195 > URL: https://issues.apache.org/jira/browse/TIKA-4195 > Project: Tika > Issue Type: Improvement > Reporter: Tim Allison > Priority: Minor > Fix For: 3.0.0 > > > The JSoupParser runs encoding detection on the InputStream. If the result is > null, the parser applies the default charset -- US-ASCII. This behavior is > ok. > The problem is that there is no way to distinguish when a faulty encoding > detector alleges 'US-ASCII' and the default behavior of the JSoupParser. I > don't think the JSoupParser should report the fallback encoding as if it were > detected. > I'm not sure how best to report this in the metadata, but we need to be able > to differentiate detection and fallback encoding. -- This message was sent by Atlassian Jira (v8.20.10#820010)