https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8286

            Bug ID: 8286
           Summary: TextCat: Ignore invisible text
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Hardware: PC
                OS: Windows 10
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Plugins
          Assignee: dev@spamassassin.apache.org
          Reporter: k...@mxguardian.net
  Target Milestone: Undefined

Created attachment 5977
  --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5977&action=edit
textcat.diff

I'm running into an issue where messages from Microsoft Teams written in
English are sometimes detected as Slovak, Czech, German
(sk.us-ascii,cs.iso-8859-2,de) or not detected at all (can't determine language
uniquely enough). This is due to a large chunk of base64 data embedded inside a
hidden div:

<section itemscope itemtype="http://schema.org/SignedAdaptiveCard";>
    <meta itemprop="@context" content="http://schema.org/extensions"; />
    <meta itemprop="@type" content="SignedAdaptiveCard" />
    <div itemprop="signedAdaptiveCard"
style="mso-hide:all;display:none;max-height:0px;overflow:hidden;">
       ...Base64Data...
    </div>
</section>

Although this mostly affects Microsoft Teams, I've seen something similar from
at least one other sender. The attached patch fixes the issue by ignoring
invisible text.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to