On 5.2.2013, at 15.58, Valery V. Sedletski <[email protected]> wrote:
> Hi, Timo and all! > > I am trying to index mail in a test mailbox using fts_solr plugin for > full-text search. On most mailboxes, it works fine, but on some big > messages I get > warnings like the following, and then I get an Out of memory error from > Solr, then the indexer-worker process (or doveadm) crashes with "assertion > failed" error and the backtrace: > > ========================================================== > doveadm([email protected]): Warning: > fts-solr([email protected]): Mailbox gmail.com UID=48 header > size is huge I'm not sure why Solr would become out of memory. If it handles huge message bodies then I don't really see why it couldn't handle huge headers.. > doveadm([email protected]): Panic: file > ../../../../src/plugins/fts-solr/solr-connection.c: line 548 > (solr_connection_post_more): assertion failed: (maxfd >= 0) This is hopefully fixed by v2.2, which uses its own lib-http instead of libcurl (which I'm apparently not using correctly). > So, it seems that Dovecot tries to parse messages in the mailbox, and can't > correctly determine where the message header ends. So, it thinks that the > message header is big, and passes very big data to Solr. When trying to > index it, Solr exhausts the available memory (though, I have 8 Gb of RAM on > my machine, and java eats more than 2 Gb when indexing). Then connections > to Solr get closed, and maxfd is invalid, hence the assertion is failed. > > Note also the following error > > ========================================================== > SEVERE: org.apache.solr.common.SolrException: undefined field text > ========================================================== > > before an out of memory error. I don't know about that one. > I also tried to tweak the decode2text.sh script to ignore all attachments > bigger than 1 Mb (just test if the file is bigger than 1 Mb, and if so, > return "1"). This won't help. As I understood, this is because of big > header, so attachments doesn't matter. Yes. > I separated the set of messages which cause this error (by their UID's). > So, I can give them as a testcase, the size of them all in archive is about > 40 Mb. The error can be reproduced if put all these messages into an empty > mailbox, and do reindexing, via IMAP search, or via "doveadm index -u ". Is it really a message with huge header? Also MIME headers are counted as headers. Anyway, http://hg.dovecot.org/dovecot-2.1/rev/0a932ba1f01f hopefully helps?
