Excellent. I'll see how I can integrate the encode() call.
Jochen Schroer wrote:
> Hi Paul,
>
> it works great in postgresql if you convert the datatype bytea to text
> before compare, than it is possible to use ILIKE
>
> SELECT m.message_idnr,k.messageblk FROM dbmail_messageblks k JOIN
> dbmail_physmessage p ON k.physmessage_id = p.id JOIN dbmail_messages m
> ON p.id = m.physmessage_id WHERE mailbox_idnr = '1' AND status IN
> ('0','1' ) AND k.is_header = '0' group by m.message_idnr,k.messageblk
> HAVING encode(k.messageblk::bytea,'escape') ILIKE '%Kündigung%'
>
> The performance for this extra operation is drastic but okay (better
> than the "old" solution), here some statistics from my test-system
> (nearly 2500 real Mails for testing, every statement running 10 times
> and I calculate average):
>
> *case sensitive:*
> SELECT m.message_idnr,k.messageblk FROM dbmail_messageblks k JOIN
> dbmail_physmessage p ON k.physmessage_id = p.id JOIN dbmail_messages m
> ON p.id = m.physmessage_id WHERE mailbox_idnr = '1' AND status IN
> ('0','1' ) AND k.is_header = '0' group by m.message_idnr,k.messageblk
> HAVING k.messageblk like '%Kündigung%'
>
> Result: 73 Records, 488.979ms
>
> *case sensitive (LIKE, but encoding to see encoding costs):*
> SELECT m.message_idnr,k.messageblk FROM dbmail_messageblks k JOIN
> dbmail_physmessage p ON k.physmessage_id = p.id JOIN dbmail_messages m
> ON p.id = m.physmessage_id WHERE mailbox_idnr = '1' AND status IN
> ('0','1' ) AND k.is_header = '0' group by m.message_idnr,k.messageblk
> HAVING encode(k.messageblk::bytea,'escape') like '%Kündigung%'
>
> Result: 73 record, 770.314ms
>
> *case insensitive (ILIKE + encoding):*
> SELECT m.message_idnr,k.messageblk FROM dbmail_messageblks k JOIN
> dbmail_physmessage p ON k.physmessage_id = p.id JOIN dbmail_messages m
> ON p.id = m.physmessage_id WHERE mailbox_idnr = '1' AND status IN
> ('0','1' ) AND k.is_header = '0' group by m.message_idnr,k.messageblk
> HAVING encode(k.messageblk::bytea,'escape') ilike '%Kündigung%'
>
> Result: 81 records, 883.023ms
>
> (You dont need the ::bytea, it's okay if you simple use
> encode(k.messageblk,'escape') )
>
> I testing to use my changes in the source and I get the case-insensitive
> results back in my Thunderbird in less than 1 second, yesterday (with
> the old method) it needs 32 seconds !!!
>
>
> Kind regards,
>
> Jochen
>
>> Hi Paul,
>>
>> the idea looks good, but postgresql is a little bit more fussy then
>> mysql, so your SQL failed.
>> 1. for your select postgresql needs a grouping
>> 2. at the moment there is no case insensitive operator for bytea
>> datatypes (but I'm searching for a workaround)
>>
>> this statement work (case-sensitive):
>> SELECT m.message_idnr,k.messageblk FROM dbmail_messageblks k JOIN
>> dbmail_physmessage p ON k.physmessage_id = p.id JOIN dbmail_messages m
>> ON p.id = m.physmessage_id WHERE mailbox_idnr = '1' AND status IN
>> ('0','1' ) AND k.is_header = '0' group by m.message_idnr,k.messageblk
>> HAVING k.messageblk like '%Kündigung%'
>>
>> The speedup is very good!
>>
>> Kind regards,
>>
>> Jochen
>>
>>> Jochen, Aaron, and all sql gurus:
>>>
>>> I've done this (tiny, trivial, minor) change and have pushed it onto
>>> svn-trunk. Dbmail now does the body search directly in the database
>>> using HAVING SQL_INSENSITIVE_LIKE '%somestring%' against the messageblks.
>>>
>>> A massive improvement of body searches both in terms of performance and
>>> compliance, I believe.
>>>
>>>
>>>
>>> shoot me. I need a break.
>>>
>>> enjoy.
>>>
>>>
>>>
>>> Paul J Stevens wrote:
>>>
>>>
>>>> Jochen,
>>>>
>>>> Just to keep you up to date though. I've been playing a little with the
>>>> non-indexed HAVING queries. Looks like a massive improvement over the
>>>> current situation (surpise! :-\ not!)
>>>>
>>>> some thing like
>>>>
>>>> select m.message_idnr,k.messageblk
>>>> from dbmail_messageblks k
>>>> join dbmail_physmessage p on k.physmessage_id = p.id
>>>> join dbmail_messages m on p.id = m.physmessage_id
>>>> join dbmail_mailboxes b on m.mailbox_idnr = b.mailbox_idnr
>>>> join dbmail_users u on b.owner_idnr = u.user_idnr
>>>> where u.userid='pablo' and b.name = 'list/dbmail'
>>>> having k.messageblk like '%Kündigung%'
>>>>
>>>> where of course the query will be tweaked to match the engine. This
>>>> example is for mysql
>>>>
>>>> Major speedup, very simple solution. Now on to code.
>>>>
>>>>
>>>> Paul J Stevens wrote:
>>>>
>>>>
>>>>> Jochen Schroer wrote:
>>>>>
>>>>>
>>>>>> Hi Paul,
>>>>>>
>>>>>> this multiple cenversations decrease the oerformance of body search
>>>>>> again, because at the meoment the search process has to load every
>>>>>> messagepart of every message the performance is not really good.
>>>>>> I'm thinking a little bit about a universal solution, but I have now
>>>>>> idea up to now.
>>>>>> (for me with postgresql I can add a new messagetext table, fill them
>>>>>> with a trigger an d search with simple selects on this table, but that
>>>>>> is no universal solution for postgresql, mysql and sqlite)
>>>>>>
>>>>>> do you have think about a solution where you save the message twice if a
>>>>>> mail is inserted into the system?
>>>>>> One time in the existing format and additional in another table with
>>>>>> converted (UTF8?) content that you can use for search?
>>>>>>
>>>>>>
>>>>> I've designed a solution to address this problem. Check the wiki:
>>>>>
>>>>> http://www.dbmail.org/dokuwiki/doku.php?id=bodysearch
>>>>>
>>>>> Basically, you can't use indexed searches on blobs. So the idea is to
>>>>> store only the /words/ from each attachment in separate records. Once
>>>>> that is done, those records /can/ be searched. Even non-indexed searches
>>>>> directly on the database (i.e. using HAVING) would probably be an
>>>>> improvement over the current situation.
>>>>>
>>>>> If you have a better idea, please share.
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>
>
> --
> ECS-Solution GmbH
> Raiffeisenstr. 26
> 67817 Imsbach
> Germany
>
> Web: http://www.ecs-solution.de
> Tel: +49 (6302) 609 71-0
> Fax: +49 (6302) 609 71-1
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Dbmail mailing list
> [email protected]
> https://mailman.fastxs.nl/mailman/listinfo/dbmail
--
________________________________________________________________
Paul Stevens paul at nfg.nl
NET FACILITIES GROUP GPG/PGP: 1024D/11F8CD31
The Netherlands________________________________http://www.nfg.nl