I saw your ticket about that - I'll check it out soon. Sorry, been busy at the calconnect conference in the UK this week.
I also did this: https://github.com/cyrusimap/cyrus-imapd/commit/27513a9bc3f217f388bac163820f9879178071fb Which I believe will mean that if you change the defaultsearchtier, it will immediately start indexing to the new location. You'll definitely want to restart a server over changing that config option though, and not have it be different between different invocations of squatter, or you'll wind up creating a lot of xapainactive entries! Bron. On Tue, Jun 4, 2019, at 04:19, Dilyan Palauzov wrote: > Hello Bron, > > imap/squatter.c:do_compact() does call `if (sleepmicroseconds) > usleep(sleepmicroseconds);` so -S number is honoured with `squatter > -t… -z…`. > > Will `squatter -F -z… -t…` be fixed on the stable branch, or shall > calling `squatter -F -t… -z` be discouraged with 3.0? > > Providing that currently after `squatter -F -t… -z…` calling `squatter > -t… -z` does reindex all messages and therefore creates a new xapian > index, it must be possible to to create a compacted database directly, > without creating an bloated index first. > > My understaning to the rolling mode is that once a new message > appears/arrives/is APPENDed or deliver(ed), it is added to the sync > log and then indexed in rolling mode. Then arrives a message at a > different place, it is added to the log and then indexed. Whether the > first and second messages are in the same mailbox is completely > random. Why does squatter not sleep, if the two messages are in the > same mailbox and works non-stop otherwise, say why does it sleep > depending on random circumstances? > > https://wiki.dovecot.org/Plugins/FTS/Squat says for DoveCot that IMAP > requires that SEARCH is done also on substings, no IMAP server > implements this requirement, and dovecot does implement it only when > Squat indices are used. Is the same valid for Cyrus Imap (Squat index > offers substring search, Xapian index does not offer substring search)? > > Runnig squatter once printed “compressing X:0,X,Y:0 to Y:3 for … > (active Y:0,X:0,X,Y:0,Y:1,Y:2)” > (https://github.com/cyrusimap/cyrus-imapd/issues/2764) so I suspect a > tiername without a number was in the .xapianactive file. > > If I do any compact (-o, -F, -X, just -t -z), where the first tier is > not referenced, does squatter ensure that the default tier according > to imapd.conf is inserted in the xapianactive file. Or asking in > other ways, it I change imapd.conf and create a new tier T6 and > declare T5 to be the default tier, which of the following will insert > a reference to T5:0 in .xapianactive and which will not: > > squatter -t T2 -o -z T2 > squatter -t T5,T2 -z T2 > squatter -t T5 -o T4 > squatter -t T2 -F T3 > sqautter -t T2 -X T3 > or what else? (The name T5 is declared, and the root directory exist, > but neither there is data in the directory, nor is T5 yet in any > .xapianactive file). > > Regards > Дилян > ----- Message from Bron Gondwana <br...@fastmailteam.com> --------- > Date: Tue, 04 Jun 2019 01:53:23 +1000 > From: Bron Gondwana <br...@fastmailteam.com> > Subject: Re: squatter -F increases the index size > To: Cyrus Devel <cyrus-devel@lists.andrew.cmu.edu> > > > > On Sat, Jun 1, 2019, at 04:34, Dilyan Palauzov wrote: > >> Hello, > >> > >> I gave squatter -F a try. > >> > >> Before I run it for a user tier T1 was not compacted and allocated 3,4 > >> MB (mega), T2 was compacted and contained 3.7GB (giga). After > >> removing the records of the deteled messages, say running squatter -F > >> T2 was 5.7GB and squatter printed “filtering” instead of “compacting”. > >> Then I run again “squatter -t T1,T2 -z T2” without -F, without -X > >> and squatter reindexed all messages, to create a 3.0 GB index. > >> > >> I expected, that using -F the resulting database will be compacted and > >> on the second call there will be no reindexing. > > > > I discovered some bad bugs in -F recently, so I suspect that's why. > > They should be fixed on master now. > > > >> When does squatter decide on its own to reindex? > > > > When the DB version is too old (which is one of the -F bugs - it > > wasn't setting the DB version, so it assumed the data was all > > version zero!) > > > >> What do G records in conversations.db contain? > > > > G records contain a mapping from GUID to folder number (offset into > > the $FOLDER_NAMES key) and UID and optionally IMAP part number as > > the key - mapping to values which contain some keywords and modseq > > from the original record as well. > > > >> My reading is that the way to create a Xapian index of an indexed > >> mailbox, is that first squatter has to be run in INDEX mode and then > >> in COMPACT mode. In particular it is not possible to create in one > >> step a compacted database. > > > > No, it's not - due to the way to compact API works. At least, I > > haven't figured out how. > > > >> Does squatter -R -S sleep after each mailbox or after each message indexed? > > > > It sleeps after each mailbox. > > > >> When compacting, squatter deals just with messages and on search or > >> reindex the conversations.db is used to map the messages to mailboxes. > >> How does squatter -S sleep after each mailbox during compacting, if > >> it knows nothing about mailboxes? > > > > -S is not used when compacting. > > > >> What does mean a tier name in a xapianactive file without a number? > > > > that shouldn't happen. It will be parsed as the same as tier:0 I believe. > > > >> What are XAPIAN_DBW_CONVINDEXED and _XAPINDEXED? > > > > Two different ways to know if a document is indexed. CONVINDEXED > > uses the conversations DB to look up mailbox and uid and then the > > cyrus.indexed.db databases to see if the message has already been > > seen. > > > > XAPINDEXED uses the metadata inside the Xapian databases to know if > > a particular message has been indexed based on the cyrusid.*G* > > metadata values which are identical to the GUIDs themselves. > > > >> What does the file sync/squatter? > > > > It's a sync/$channel directory which squatter watches on. This is a > > method for providing a queue of mailboxes to look at based on the > > APPEND sync_log statements. > > > >> squatter can print “Xapian: truncating text from message mailbox > >> user.... uid 7309”. When are messages truncated for the purposes of > >> indexing? > > > > When they are too long! The comment in the source code says this: > > > > /* Maximum size of a query, determined empirically, is a little bit > > * under 8MB. That seems like more than enough, so let's limit the > > * total amount of parts text to 4 MB. */ > > #define MAX_PARTS_SIZE (4*1024*1024) > > > > This is a holdover from when Greg was working on it. We could switch > > this to be a configurable option. > > > >> Do I understand correctly, that for a Xapianactive file with "A B C D > >> E", to remove C one has to call "squatter -t C,D -z D". But A cannot > >> be removed, if it the defaultsearchtier. Is the defaultsearchtier > >> always included in the xapianactive file, if the tier is missing, > >> whenever the file is modified (and the only way to modify it is to > >> call squatter in COMPACT mode)? > > > > When you do any compact, if it includes the first item (the writable > > database) then a new writable database will be created on the > > default tier. So if you try to compact the default tier away, a new > > default tier item will be created. > > > > Bron. > > > > -- > > Bron Gondwana, CEO, FastMail Pty Ltd > > br...@fastmailteam.com > > > ----- End message from Bron Gondwana <br...@fastmailteam.com> ----- > > > -- Bron Gondwana, CEO, FastMail Pty Ltd br...@fastmailteam.com