Thomas,

Is the expense concern the computational complexity (escaping for SQL), the primary storage use (malloc'ing 5*READ_BLOCK_SIZE), or secondary storage use (not exploiting PostgreSQL's text compression)? Perhaps the first two of these could be addressed by using PostgreSQL's parameterized query execution functions. As for secondary storage use, my understanding is that both "text" and "bytea" can be stored compressed and uncompressed (at the operator's discretion).

The patch I sent before was indeed sub-optimal w.r.t. performance, but fortunately was optimal w.r.t. my level of effort, portable among dbmail revisions, and adequate w.r.t. my performance requirements (I am my DB's sole user ;).

Regarding message body searching: it seems to me that searching a "text" column has the same complexities as searching "bytea". In both cases, it would be necessary to transcode the search string (or the message block) to matching encodings. (As you mentioned, switching entirely UTF-8 would optimize for search -- at the expense of most other operations.)

Regards,
Robert

P.S. My selfish desire is for dbmail to not compel a specific DB encoding, because I'm combining dbmail and other data in the same DB -- I've patched dbmail to segregate its tables into its own schema.

Thomas Mueller wrote:

Hi Robert,

Here's the patch..


I don't think that's a solution but a (expensive) workaround. The problem is: Postgres compresses text but stores bytea as it is.
In future it would be interesting to search on message bodies as well.

What about converting every string to UTF-8 and use a UTF-8 database?
Paul is it hard to convert a incoming message to UTF-8 and use UTF-8 everywhere?


Thomas


Reply via email to