On Tue, 2007-12-11 at 00:14 +0800, Joe Wong wrote: > I've found that the problem may not be related to the non english content. I > just tried delete dovecot.index.* from the not-working folder. After that, > full text search is also working on that folder, no more Corrupted squat > uidlist file error. Why are the two related to each other?
I don't know really .. They shouldn't have anything to do with each others. Although if you also deleted dovecot-uidlist, Dovecot assigns new UIDs to messages and that might have helped. But if dovecot-uidlist was there, then I've no idea. If you can reproduce this somehow I'd like to know. > Can the system auto-heal under such condition? Dovecot should always auto-heal itself, so it's a bug if it doesn't. > By the way, for chinese, each BIG5 charcter is two bytes long and it is 3 > bytes in UTF-8 encoding. For Chinese, a search word can contain 1 > "character" or more. I think the indexer should convert the text to UTF-8 > and cut the word to UTF-8 character but not bytes boundary. That's how it works currently. With the byte count I meant that it would cut at the previous (or the next) character after that many bytes. So for example "abcd" and "åäöå" would be indexed as 4 characters, because the first takes 4 bytes and the second takes 8 bytes, but then 4 chinese characters each taking 3 bytes would be cut after 2 or 3 characters. None of this affects the actual search results. Only how much disk space, memory and disk I/O is used when searching.
signature.asc
Description: This is a digitally signed message part