On Thu,  3 Dec 2009 03:15:26 +0600, Mikhail Gusarov <dotted...@dottedmag.net> 
wrote:
> In order to handle message renames the following changes were deemed
> necessary:

Hi Mikhail,

Thanks for contributing this patch (twice!). I think if I had gotten to
it sooner, I probably would have committed it. But now...

> * Mtime check on individual files was disabled. As files may be moved around
> without changing their mtime, it's necessary to parse them even if they appear
> old in case old message was moved. mtime check on directories was kept as 
> moving
> files changes mtime of directory.

That sounds pretty harsh. I'm having to do a lot of stat() calls already
when new mail arrives. Having to also parse the message ID out of
(roughly, for me) 10000 files every time sounds pretty rough. Fortunately...

> Note that after applying this patch notmuch still does not handle copying 
> files
> (which is harmless, database will point to the last copy of message found 
> during
> 'notmuch new') and deleting files (which is more serious, as dangling entries
> will show up in searches).

Today, Keith and designed an interface that will support addition,
copying, rename, and deletion of files. And it will be faster than the
existing code with its mtime heuristics.

The complete design is on Keith's laptop right now, and hopefully he'll
appear soon with an implementation. Basically, there are only two new
functions needed in the library (if we got the design right):

        notmuch_directory_t
        notmuch_database_read_directory (notmuch_database_t *database,
                                         const char *path);

        notmuch_status_t
        notmuch_message_remove_filename (notmuch_message_t *message,
                                         const char *filename);

The notmuch_directory_t object will be used in place of the current
notmuch_database_get_timestamp call in notmuch-new.c. In addition to the
mtime that we currently read from the database, it will provide a list
of all directories and files (along with message IDs) known to the
database for a particular path. So notmuch-new can then quickly compare
the results of scandir with this notmuch_directory_t object and then
call notmuch_database_add_message and notmuch_message_remove_filename as
appropriate.

I'm leaving out details about how to ensure we don't delete a message
too soon if it's actually a rename that will be seen as an added file
later in the scan. Obviously the implementation will need to deal with
that, (either with an additional library call for "I'm done adding
files, go ahead and delete dangling messages", or by postponing all
calls to remove_filename until later).

Oh, and one idea is to do deletion by dropping all indexed terms, but
saving the message ID and any tags in the database. That's small and is
the only precious data, so might be worth holding onto "just in case".

Anyway, I think we'll see code for that soon, so I'm not planning to
commit the offered patch. But people really needing renames might want
to use it for now, (and live with any performance implications it
causes).

-Carl

Attachment: pgpK07jCVYjC6.pgp
Description: PGP signature

_______________________________________________
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch

Reply via email to