I'm running,

        dovecot --version
                2.3.11.3 (502c39af9)

        solr -version
                8.6.3

        uname -rm
                5.8.13-200.fc32.x86_64 x86_64

        grep _NAME /etc/os-release
                PRETTY_NAME="Fedora 32 (Server Edition)"
                CPE_NAME="cpe:/o:fedoraproject:fedora:32"

Solr FTS plugin is enabled/configured,

        mail_plugins = virtual acl fts fts_solr
        plugin {
                fts = solr
                fts_autoindex = yes
                fts_solr = url=https://solr.example.com:8984/solr/dovecot/
                fts_enforced = body
                fts_filters = normalizer-icu stopwords snowball
                fts_language_config = /usr/share/libexttextcat/fpdb.conf
                fts_languages = en es de fr it pt
                soft_commit = yes
        }

IMAP capability returns,

        a OK [CAPABILITY IMAP4rev1 SASL-IR LOGIN-REFERRALS ID ENABLE IDLE SORT 
SORT=DISPLAY THREAD=REFERENCES THREAD=REFS THREAD=ORDEREDSUBJECT MULTIAPPEND 
URL-PARTIAL CATENATE UNSELECT CHILDREN NAMESPACE UIDPLUS LIST-EXTENDED 
I18NLEVEL=1 CONDSTORE QRESYNC ESEARCH ESORT SEARCHRES WITHIN CONTEXT=SEARCH 
LIST-STATUS BINARY MOVE SNIPPET=FUZZY PREVIEW=FUZZY STATUS=SIZE SAVEDATE 
SPECIAL-USE LITERAL+ NOTIFY SPECIAL-USE QUOTA ACL RIGHTS=texk] Logged in

I've got two messages in my IMAP store,

        cd /data/vmail/example.com/myuser/Maildir/cur/
        ls -altr | grep S= | /bin/tail -n2
                -rw-------  1 vmail vmail  1.3K Oct 11 14:05 
1602450306.M393628P65260.mx.example.com,S=1278,W=1304:2,S
                -rw-------  1 vmail vmail  1.3K Oct 11 14:05 
1602450353.M756184P65260.mx.example.com,S=1277,W=1303:2,S


that differ in BODY CONTENT --
-- one message has ascii txt with NO character accents
-- the other has the same text, but with ON character accent

                cat "1602450306.M393628P65260.mx.example.com,S=1278,W=1304:2,S"
                        ...
                        From: M User <[email protected]>
                        Subject: test
                        Reply-To: [email protected]
                        To: "User, My" <[email protected]>
                        Message-ID: 
<[email protected]>
                        Date: Sun, 11 Oct 2020 14:05:06 -0700
                        User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) 
Gecko/20100101
                        Thunderbird/78.3.2
                        Content-Type: text/plain; charset=utf-8; format=flowed
                        Content-Language: en-US
                        Content-Transfer-Encoding: 8bit

!!!!            también


                cat 1602450353.M756184P65260.mx.example.com,S=1277,W=1303:2,S
                        ...
                        From: M User <[email protected]>
                        Subject: test
                        Reply-To: [email protected]
                        To: "User, My" <[email protected]>
                        Message-ID: 
<[email protected]>
                        Date: Sun, 11 Oct 2020 14:05:53 -0700
                        User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) 
Gecko/20100101
                        Thunderbird/78.3.2
                        Content-Type: text/plain; charset=utf-8; format=flowed
                        Content-Language: en-US
                        Content-Transfer-Encoding: 7bit

!!!!            tambien


i manually re-scan & index

        doveadm fts rescan -u [email protected]
        doveadm index -u [email protected] -q '*'

                ...
                ==> /var/log/dovecot/dovecot-info.log <==
                2020-10-11 15:06:34 
indexer-worker([email protected])<OyUmLeqBg18fDAEA+IOfAw>: Info: Indexed 21 
messages in accts (UIDs 14399..130699)
                2020-10-11 15:06:34 
indexer-worker([email protected])<6NnOMuqBg18fDAEA+IOfAw>: Info: Indexed 16 
messages in accts/v007132 (UIDs 13414..14778)
                ...

with no errors.

then search in mail client, here TBird 78, with

        [X] Run Search on Server

for _un_accented "tambien",  match is correctly -- and quickly -- returned.

in logs,

        ==> /var/log/dovecot/dovecot-info.log <==
        2020-10-11 14:57:05 imap-login: Info: Login: user=<[email protected]>, 
method=PLAIN, rip=10.0.1.7, lip=10.0.1.50, mpid=67743, TLS
        2020-10-11 14:57:16 
indexer-worker([email protected])<3ZUzQ2yx2JKsHgsH:9gu0MbF/g1+hCAEA+IOfAw>: 
Info: Indexed 4788 messages in INBOX (UIDs 135476..140263)

BUT, repeating search for ACCENTED "también" returns *no* match/result.

No errors in log, simply no match.

Attempting to test/debug from from cmd line,

        doveadm fts lookup -u [email protected] body "tambien"

causes a PANIC

        doveadm([email protected]): Panic: file mail-storage.c: line 2112 
(mailbox_get_open_status): assertion failed: (box->opened)
        doveadm([email protected]): Error: Raw backtrace: 
/usr/lib64/dovecot/libdovecot.so.0(backtrace_append+0x46) [0x7f3ee94accc6] -> 
/usr/lib64/dovecot/libdovecot.so.0(backtrace_get+0x22) [0x7f3ee94acde2] -> 
/usr/lib64/dovecot/libdovecot.so.0(+0x10025b) [0x7f3ee94b625b] -> 
/usr/lib64/dovecot/libdovecot.so.0(+0x100297) [0x7f3ee94b6297] -> 
/usr/lib64/dovecot/libdovecot.so.0(+0x59bc6) [0x7f3ee940fbc6] -> 
/usr/lib64/dovecot/libdovecot-storage.so.0(+0x4779e) [0x7f3ee95c379e] -> 
/usr/lib64/dovecot/lib21_fts_solr_plugin.so(+0x5849) [0x7f3ee9015849] -> 
/usr/lib64/dovecot/lib20_fts_plugin.so(fts_backend_lookup+0x51) 
[0x7f3ee8c37491] -> 
/usr/lib64/dovecot/doveadm/lib20_doveadm_fts_plugin.so(+0x3280) 
[0x7f3ee8ba9280] -> doveadm(+0x343cd) [0x5637e99443cd] -> doveadm(+0x34fe0) 
[0x5637e9944fe0] -> doveadm(doveadm_cmd_ver2_to_mail_cmd_wrapper+0x22d) 
[0x5637e9945e2d] -> doveadm(doveadm_cmd_run_ver2+0x4e8) [0x5637e99568d8] -> 
doveadm(doveadm_cmd_try_run_ver2+0x3e) [0x5637e995692e] -> doveadm(main+0x1d4) 
[0x5637e9934cf4] -> /lib64/libc.so.6(__libc_start_main+0xf2) [0x7f3ee9071042] 
-> doveadm(_start+0x2e) [0x5637e99351ce]
        Aborted


(1) What config -- dovecot &/or solr -- is needed to match on accented 
characters?
(2) What add'l detail, if any, is needed for troubleshooting the panic?


Reply via email to