Paul J Stevens wrote: > > Note: the code is hard to read and I could find no documentation on the > > database, so I may be wrong here. If so please correct me... > > hard to read... understatement at large. That code is a total bitch. A > messy heap of shit which makes > overcooked spagetti look like soldiers on parade. I actually read that > code. Not once, but many times. No > kidding, and not funny so stop laughing.
My grand idea would be to rewrite it - in Python. Python has a g-r-e-a-t MIME parser sitting right in the standard libraries. And as for CPU usage, just shift everything we can to the SQL engine. > > (2) Create an index for the 'messageblk' field in dbmail_messageblks. At > > least MySQL allows this (can anyone tell me if Postgres does?). Then, on > > IMAP header field searches, do not load/parse/check; instead, create a > > regular expression and do a SELECT with it, selecting only header > > blocks. (MySQL specific again, Postgres comments welcome). > > well, the messageblks can become largish, and mysql also has some > limit here. Too bad innodb doesn't support > full-text searches. As far as I understand, it does with regular expressions. Just not using indexes. Slow, but not as slow as the full read-and-parse loop. Or am I wrong here? > I wonder what kind of index we get when we add one to the messageblk > field... I should try that some time. But > not on my main development machine :-) It will faithfully index the first whatever-number-you-state of characters. On MyISAM it can give you a full text index. To reduce the index size one could move the header messageblks to a separate table. And, make that table (and it alone) MyISAM, and go for full text indexes, eh? It's a hack, true. I have an idea that would not be a hack, too. In full accordance with database theory. A Header_Fields table for header field names, Header_Values for header field values (referencing Header_Fields), and Message_Headers for referencing Header)Values for every message (with a sort field too). From these, the headers for a message can be recovered exactly as they were - and at the same time every header search and fetch is a quick, fully indexed query. The only problem is - I'm afraid this idea won't fit into the existing code. The coding here is somewhat complicated. Well, perhaps someone who knows the code well can do it. If you want I can design the queries, but I won't touch the fetch and search code in dbmail with a flag pole. > > This will probably result in a *dramatic* speedup, at the cost of some > > coding complexity. > > This would actually simplify a lot of code. Well, I just tend to think that regular expressions = complexity. And the "hacky" variant relies on them heavily. But on the other hand, the parse-and-check loop may represent even more complexity. > > BTW, why is the "is_header" field unused in dbmail_messageblks? Or at > > least I found no place in the code that would use it. Is it redundant? > > Forward compatibility. I did commit a patch to cvs-head to start using > this field for message insertions, and > have posted a bash/awk script that will fill this field for existing > messageblk rows. Posted where? > > P.S. To be very honest, I did not like the coding style in dbmail. The C > > language for this task would not be my choice, but never mind that. > > Functions that are hundreds of lines long, with only minimal comments, > > are much more problematic. > > The original author was probably a very smart guy :-) but a not so > very experienced programmer. Agreed. I actually recognized the coding style - I wrote that way in high school. (Only in Pascal, so it was a little bit more readable). > I actually volunteered to work on that code last summer :-/ and I've > already begun splitting up some of those > functions. The recent addition of struct ImapSession in > dbmail-imapsession.c is phase one of that refactoring. > The next phase (splitting up and cleaning _ic_fetch) is well underway > and currently being tested. Still much > remains to be fixed, simplified, cleaned-up, etc, etc. Well, I'm not so sure this can work, but then I might simply be frightened by the current code. And influenced by Eric Raymond - "Plan to throw one away, you will, anyhow". Besides, by Real Grand Idea would be to use dbmail for local storage, ultimately building a direct interface to it into at least one mail reading program. (It's not as crazy as it sounds, when one has nearly a gigabyte of mail lying around - like I do.) And that would require a *clean* separation of functions, just Not Present in existing code. For example, my idea would involve a search and a fetch defined as functions, and separate IMAP parsing functions calling them. > > Perl or Python (I prefer Python) with their ready-made RFC822 parsing, > > along with some DB-expert friends, would help me write an alternative > > database store quickly. But I really am not up to implementing protocols > > (SMTP, LMTP, POP3, IMAP). If any Python guru here would go for that, we > > could try a rewrite :) > > I actually started out working on dbmail while studying the twisted > framework which has finished imap and pop3 > implementations. Should be quite easy to write sql-based storage > engines for those interfaces. Well, then there could be a compromise proposal. Dump-and-rewrite storage while refactoring imap and pop3, and introduce a clean interface between them by the way. Even existing _ic_fetch (yuck!) code could be used for parsing arguments - and then passing them to a well defined interface. > But there are some advantages to refactoring the current code. And c > is just another language to write code > in. With glib for datatypes, and gmime for message-parsing that code > could actually become more fun not only > to write, but better yet, to read again later. Well, if you do use gmime for message parsing, then probably yes. C has deficiencies in string handling, but one can get around them. > With the low level of output from Ilja recently, I am becoming > somewhat concerned that maybe IC&S have decided > to sink their investment in dbmail. This may be THE critical question. If Ilja goes on with this code, then it's worth refactoring (especially since a rewrite might mean losing that). > Should I find myself alone in actually working on the current codebase > I will re-evaluate dbmail's viability, > and may indeed decide to either go for a complete rewrite in python or > move on to other projects. If you go for Python, you're not alone. I'm actually not really a programmer, I'm a technical writer who can also program. But this has advantages. If a team for a rewrite shows up, and before any coding (except table definition and example queries in SQL), I will start with clear documentation on both the database and the interfaces. Might work wonders for clean code. And when that is complete, I can do some coding on database storage. But protocol implementation is a greater question. At least POP3 and IMAP. Scratch that - at least IMAP, we could live without POP3 for some time. Yours, Mikhail Ramendik