On Thu, 26 Nov 2009 10:46:54 -0800, Carl Worth <cworth at cworth.org> wrote: > So perhaps the new configuration option we want is a limit on message > size? Rather than ignoring large files entirely, notmuch could just stop > indexing messages past the configured limit?
Having just written that, I don't think it's actually an interesting option. Instead of working around the bug, we should just find out what the bug actually is. It could be that Xapian's TermGenerator is just going nuts here. Or it could be that Xapian is just trying to hold too much data in memory instead of flushing it out to disk. Currently, notmuch doesn't ever call any explicit Xapian flush. Instead, we rely on the default behavior which is that Xapian will flush to disk after every batch of 10000 documents added. So it's possible that all that's actually needed here is for notmuch to notice that it just indexed a huge file, and then explicitly flush to avoid Xapian using too much memory. Or, perhaps better, Xapian could be fixed to automatically flush if its memory usages gets "too big", (if the missing flush is actually what's needed here). Clearly, some experimenting is needed. Dominik, if you can share the large file, (with either me alone or with the whole list), a pointer to where we could download it would be appreciated. -Carl