[jira] [Commented] (TIKA-4195) JSoupParser conceals null from the EncodingDetector

ASF GitHub Bot (Jira) Mon, 12 Feb 2024 10:12:03 -0800


    [ 
https://issues.apache.org/jira/browse/TIKA-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17816694#comment-17816694
 ]


ASF GitHub Bot commented on TIKA-4195:
--------------------------------------

tballison merged PR #1591:
URL: https://github.com/apache/tika/pull/1591




> JSoupParser conceals null from the EncodingDetector
> ---------------------------------------------------
>
>                 Key: TIKA-4195
>                 URL: https://issues.apache.org/jira/browse/TIKA-4195
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Minor
>
> The JSoupParser is runs encoding detection on the inputstream. If the result 
> is null, the parser applies the default charset -- US-ASCII. This behavior is 
> ok. 
> The problem is that there is no way to distinguish when a faulty encoding 
> detector alleges 'US-ASCII' and the default behavior of the JSoupParser. I 
> don't think the JSoupParser should report the fallback encoding as if it were 
> detected.
> I'm not sure how best to report this in the metadata, but we need to be able 
> to differentiate detection and fallback encoding.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (TIKA-4195) JSoupParser conceals null from the EncodingDetector

Reply via email to