I've modified my first setup. I'm now storing each mime-chunk as follows:
CREATE TABLE dbmail_partlists (
physmessage_id INTEGER NOT NULL,
is_header BOOLEAN DEFAULT '0' NOT NULL, ## each
mime-header chunk is a header blob
part_key INTEGER DEFAULT '0' NOT NULL, ## simply sequence
per message
part_depth INTEGER DEFAULT '0' NOT NULL, ## used for
message/rfc822 attachments
part_order INTEGER DEFAULT '0' NOT NULL,
part_id TEXT NOT NULL ## foreign key to mimeparts
);
CREATE TABLE dbmail_mimeparts (
id TEXT NOT NULL, ## primary key sha1
data TEXT NOT NULL,
size INTEGER NOT NULL
);
I've got insertion working beautifully. Attachments are stored encoded
but seperate from the mime-headers that come with them. Those are
inserted separately. The jury is still out as to decoding the
attachments first. I'll probably add that as well. In fact that would
mean you could insert the same attachment under different filenames
using different encodings (base64/uuencode) and it would still be stored
only once. Me like much.
So: inserting the same messages over and over doesn't add *anything* in
the mimeparts table.
Retrieval is almost done but not quite there yet. I'm still playing
around with the reconstruction and addition of the proper mime boundary
strings. Almost there though. Once that is done, I can update the rest
of the code (mostly some minor parts of imap) that talk to the
messageblks table directly.
git-branch:
http://nfg3.nfgs.net/var/git/dbmail.git#mimechunk
Jake Anderson wrote:
> Paul J Stevens wrote:
>> Aaron Stone wrote:
>>
>>> I think we should keep things encoded, because that's what clients
>>> expect to receive. OTOH, encoded data cannot be searched.
>>>
>>
>> I don't see how we can do both decoding and sha1 digests reliably at the
>> same time. Seems like asking for a *lot* of trouble that is simply not
>> worth it.
>>
>>
> With all the talk of reducing file size I taught a basically free 30%
> reduction would be worth it lol.
> I don't see why it is so difficult if you are already seperating the
> chunks? Does it not tell you if that chunk is encoded?
> If it is then decode it, then hash it then store it.
>
> It wouldn't be on the fly but since the whole message is assembled
> anyway before it gets sent to the database I don't see the big issue?
> (IE trying to halt a memory copy at the expense of a 30% reduction in
> write to disk doesn't sound worth it to me)
>> Doing search on binary attachments is not required by any RFC, and if
>> required should be done through a separate decode/stringify/index setup
>> (check the wiki).
>>
> I agree with you there, there isn't a valid reason i can come up with
> for an end user wanting to search a binary attachment. Although
> especially with the adoption of plain text file formats for documents I
> can see that becoming a potentially winning feature. I have often been
> in the situation of needing to find a particular document somebody
> emailed amidst four bazillion (hah bazillion passes spell check) other
> emails from them about the same time. If the mime parts are decoded then
> they could conceivably use the same fulltext indexing the message bodies
> use?
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Dbmail-dev mailing list
> [email protected]
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
--
________________________________________________________________
Paul Stevens paul at nfg.nl
NET FACILITIES GROUP GPG/PGP: 1024D/11F8CD31
The Netherlands________________________________http://www.nfg.nl
_______________________________________________
Dbmail-dev mailing list
[email protected]
http://twister.fastxs.net/mailman/listinfo/dbmail-dev