Dear all, First, I would like to congratulate you for working several years on dbmail. After fetching source code from GIThub, I could count 46682 lines code in src and 152170 in total. Not bad!
For years, I have been looking for a reliable mail server relying on SQL (i.e. PostgreSQL in my case) and I am happy I probably found it. I would like to participate in coding and I have some questions. Here are some questions: 1) SQL Table structure Are there reasons for splitting the message table into several sub-tables. For example: dbmail_messages and dbmail_physmessage and dbmail_messageblks CREATE TABLE dbmail_messages ( message_idnr bigint NOT NULL DEFAULT nextval('dbmail_message_idnr_seq'::regclass), mailbox_idnr bigint, physmessage_id bigint, seen_flag smallint NOT NULL DEFAULT 0::smallint, answered_flag smallint NOT NULL DEFAULT 0::smallint, deleted_flag smallint NOT NULL DEFAULT 0::smallint, flagged_flag smallint NOT NULL DEFAULT 0::smallint, recent_flag smallint NOT NULL DEFAULT 0::smallint, draft_flag smallint NOT NULL DEFAULT 0::smallint, unique_id character varying(70) NOT NULL, status smallint NOT NULL DEFAULT 0::smallint, CONSTRAINT dbmail_messages_pkey PRIMARY KEY (message_idnr), CONSTRAINT dbmail_messages_mailbox_idnr_fkey FOREIGN KEY (mailbox_idnr) REFERENCES dbmail_mailboxes (mailbox_idnr) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE, CONSTRAINT dbmail_messages_physmessage_id_fkey FOREIGN KEY (physmessage_id) REFERENCES dbmail_physmessage (id) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE ) WITH ( OIDS=FALSE ); and CREATE TABLE dbmail_physmessage ( id bigint NOT NULL DEFAULT nextval('dbmail_physmessage_id_seq'::regclass), messagesize bigint NOT NULL DEFAULT 0::bigint, rfcsize bigint NOT NULL DEFAULT 0::bigint, internal_date timestamp without time zone, CONSTRAINT dbmail_physmessage_pkey PRIMARY KEY (id) ) WITH ( OIDS=FALSE ); ALTER TABLE dbmail_physmessage OWNER TO dbmail; and CREATE TABLE dbmail_messageblks ( messageblk_idnr bigint NOT NULL DEFAULT nextval('dbmail_messageblk_idnr_seq'::regclass), physmessage_id bigint, messageblk bytea NOT NULL, blocksize bigint NOT NULL DEFAULT 0::bigint, is_header smallint NOT NULL DEFAULT 0::smallint, CONSTRAINT dbmail_messageblks_pkey PRIMARY KEY (messageblk_idnr), CONSTRAINT dbmail_messageblks_physmessage_id_fkey FOREIGN KEY (physmessage_id) REFERENCES dbmail_physmessage (id) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE ) WITH ( OIDS=FALSE ); ALTER TABLE dbmail_messageblks OWNER TO dbmail; The only reasons I see for separating those tables would be storing them in different tablespaces. But even this is questionable. I am worried that this kind of separation might result on sequential scans instead of memory and index scans. A JOIN without a reason is always an expansive CPU task. Each row needs several CPU cycles. Under modern systems like PostgreSQL, it seems more interesting to have a single table with advanced indexing: partial indexes and/or full-text seach (FTS) and/or materialized views. These indexes can be placed on a different tablespace, i.e. an SSD drive. Rebuilding FTS happens in background, so that inserts are fast. This is a much modern approach that splitting a table in three. Also, I am worried that running a lot of LEFT JOINs in the database might open security breaches. This is the second issue also. 2) Privilege separation I read the post on the WIKI with interest and would like to add this information: PostgreSQL supports CREATE SCHEMA: http://www.postgresql.org/docs/9.4/static/sql-createschema.html CREATE SCHEMA is compatible with the prefix of tables. But it is more powerful as we can restrict a connection to a schema. When connecting, you may restrict to a schema: set search_path to 'schema' You may also "glue" a user to a specific schema: ALTER USER user_name SET search_path to 'schema' Using schemas, it may be possible to create as many database structures using an imap identifier (j...@doe.fr). This is to prevent the scope of any SQL injection and limit it to the sole account of the user. 3) Database performance / SQL injection Could you point me to any filter against database injection in the source code. Where is it located in source code? 4) Performance / memory usage I found this kind of code in dm_mailbox.c c = db_con_get(); TRY stmt = db_stmt_prepare(c, "SELECT id,message_idnr FROM %sphysmessage p " "LEFT JOIN %smessages m ON p.id=m.physmessage_id " "LEFT JOIN %smailboxes b ON b.mailbox_idnr=m.mailbox_idnr " "WHERE b.mailbox_idnr=? ORDER BY message_idnr", DBPFX,DBPFX,DBPFX); db_stmt_set_u64(stmt, 1, self->id); r = db_stmt_query(stmt); OK, this is only a prepare statement ... but as I receive logs using mail, some of my mailboxes usually have more than 100.000 message. I This query results in a huge sequential scan, followed by an ORDER BY statement and then is transferred to memory. I did not test with EXPLAIN ANALYSE but it must be very CPU and memory expansive to run. ids = p_list_new(self->pool); while (db_result_next(r)) { physid = db_result_get_u64(r,0); msgid = db_result_get_u64(r,1); if (g_tree_lookup(uids,&msgid)) { id = mempool_pop(self->pool, sizeof(uint64_t)); *id = physid; ids = p_list_append(ids,id); } } For performance, there should be more LIMIT and OFFSET statements to trim recordsets and avoid filling memory. *** IMHO , DBmail is not usable in production: I am worried that it contains source code that allows SQL injection and gives access to all messages at once. Without SCHEMA support I will probably not install DBmail. Would you welcome a discussion followed by a patch/review for PostgreSQL to make it work using SCHEMAs? It should not alter table structure. Would you welcome a SCHEMA patch? What are your recommendations and guidelines? Kind regards, Kellogs _______________________________________________ Dbmail-dev mailing list Dbmail-dev@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail-dev