Dear all,

First, I would like to congratulate you for working several years on
dbmail. After fetching source code from GIThub, I could count 46682
lines code in src and 152170 in total. Not bad!

For years, I have been looking for a reliable mail server relying on SQL
(i.e. PostgreSQL in my case) and I am happy I probably found it.

I would like to participate in coding and I have some questions.

Here are some questions:

1) SQL Table structure

Are there reasons for splitting the message table into several
sub-tables. 

For example: dbmail_messages and dbmail_physmessage and
dbmail_messageblks

CREATE TABLE dbmail_messages
(
  message_idnr bigint NOT NULL DEFAULT
nextval('dbmail_message_idnr_seq'::regclass),
  mailbox_idnr bigint,
  physmessage_id bigint,
  seen_flag smallint NOT NULL DEFAULT 0::smallint,
  answered_flag smallint NOT NULL DEFAULT 0::smallint,
  deleted_flag smallint NOT NULL DEFAULT 0::smallint,
  flagged_flag smallint NOT NULL DEFAULT 0::smallint,
  recent_flag smallint NOT NULL DEFAULT 0::smallint,
  draft_flag smallint NOT NULL DEFAULT 0::smallint,
  unique_id character varying(70) NOT NULL,
  status smallint NOT NULL DEFAULT 0::smallint,
  CONSTRAINT dbmail_messages_pkey PRIMARY KEY (message_idnr),
  CONSTRAINT dbmail_messages_mailbox_idnr_fkey FOREIGN KEY
(mailbox_idnr)
      REFERENCES dbmail_mailboxes (mailbox_idnr) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE CASCADE,
  CONSTRAINT dbmail_messages_physmessage_id_fkey FOREIGN KEY
(physmessage_id)
      REFERENCES dbmail_physmessage (id) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE CASCADE
)
WITH (
  OIDS=FALSE
);

and

CREATE TABLE dbmail_physmessage
(
  id bigint NOT NULL DEFAULT
nextval('dbmail_physmessage_id_seq'::regclass),
  messagesize bigint NOT NULL DEFAULT 0::bigint,
  rfcsize bigint NOT NULL DEFAULT 0::bigint,
  internal_date timestamp without time zone,
  CONSTRAINT dbmail_physmessage_pkey PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE dbmail_physmessage
  OWNER TO dbmail;

and 

CREATE TABLE dbmail_messageblks
(
  messageblk_idnr bigint NOT NULL DEFAULT
nextval('dbmail_messageblk_idnr_seq'::regclass),
  physmessage_id bigint,
  messageblk bytea NOT NULL,
  blocksize bigint NOT NULL DEFAULT 0::bigint,
  is_header smallint NOT NULL DEFAULT 0::smallint,
  CONSTRAINT dbmail_messageblks_pkey PRIMARY KEY (messageblk_idnr),
  CONSTRAINT dbmail_messageblks_physmessage_id_fkey FOREIGN KEY
(physmessage_id)
      REFERENCES dbmail_physmessage (id) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE CASCADE
)
WITH (
  OIDS=FALSE
);
ALTER TABLE dbmail_messageblks
  OWNER TO dbmail;

The only reasons I see for separating those tables would be storing them
in different tablespaces. But even this is questionable.

I am worried that this kind of separation might result on sequential
scans instead of memory and index scans. A JOIN without a reason is
always an expansive CPU task. Each row needs several CPU cycles.

Under modern systems like PostgreSQL, it seems more interesting to have
a single table with advanced indexing: partial indexes and/or full-text
seach (FTS) and/or materialized views. These indexes can be placed on a
different tablespace, i.e. an SSD drive. Rebuilding FTS happens in
background, so that inserts are fast. This is a much modern approach
that splitting a table in three.

Also, I am worried that running a lot of LEFT JOINs in the database
might open security breaches. This is the second issue also.

2) Privilege separation

I read the post on the WIKI with interest and would like to add this
information:

PostgreSQL supports CREATE SCHEMA:
http://www.postgresql.org/docs/9.4/static/sql-createschema.html

CREATE SCHEMA is compatible with the prefix of tables. But it is more
powerful as we can restrict a connection to a schema.

When connecting, you may restrict to a schema:
set search_path to 'schema'

You may also "glue" a user to a specific schema:
ALTER USER user_name SET search_path to 'schema'

Using schemas, it may be possible to create as many database structures
using an imap identifier (j...@doe.fr).

This is to prevent the scope of any SQL injection and limit it to the
sole account of the user.

3) Database performance / SQL injection

Could you point me to any filter against database injection in the
source code. Where is it located in source code?

4) Performance / memory usage

I found this kind of code in dm_mailbox.c

c = db_con_get();
TRY
stmt = db_stmt_prepare(c,
  "SELECT id,message_idnr FROM %sphysmessage p "
  "LEFT JOIN %smessages m ON p.id=m.physmessage_id "
  "LEFT JOIN %smailboxes b ON b.mailbox_idnr=m.mailbox_idnr "
  "WHERE b.mailbox_idnr=? ORDER BY message_idnr",
  DBPFX,DBPFX,DBPFX);
db_stmt_set_u64(stmt, 1, self->id);
r = db_stmt_query(stmt);

OK, this is only a prepare statement ... but

as I receive logs using mail, some of my mailboxes usually have more
than 100.000 message. I

This query results in a huge sequential scan, followed by an ORDER BY
statement and then is transferred to memory. I did not test with EXPLAIN
ANALYSE but it must be very CPU and memory expansive to run.

ids = p_list_new(self->pool);
 while (db_result_next(r)) {
  physid = db_result_get_u64(r,0);
  msgid = db_result_get_u64(r,1);
  if (g_tree_lookup(uids,&msgid)) {
   id = mempool_pop(self->pool, sizeof(uint64_t));
   *id = physid;
   ids = p_list_append(ids,id);
  }
 } 

For performance, there should be more LIMIT and OFFSET statements to
trim recordsets and avoid filling memory. 

***

IMHO , DBmail is not usable in production:

I am worried that it contains source code that allows SQL injection and
gives access to all messages at once. Without SCHEMA support I will
probably not install DBmail.

Would you welcome a discussion followed by a patch/review for PostgreSQL
to make it work using SCHEMAs? It should not alter table structure.

Would you welcome a SCHEMA patch? What are your recommendations and
guidelines?

Kind regards,
Kellogs

_______________________________________________
Dbmail-dev mailing list
Dbmail-dev@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail-dev

Reply via email to