Thanks Keith for this excellent explanation. What do you think about adding statistical information of compactor (current waste space ratio, number of failed/completed compactions.. and so on) in a separate command (like stat-binlogs) or by extending current stats command!?
Cheers 2014-09-12 2:48 GMT+02:00 Keith Rarick <[email protected]>: > On Thu, Sep 11, 2014 at 2:21 PM, michele zuppala <[email protected]> wrote: > > I've little experience with Beanstalk then correct me if I was wrong, but > > I've notice that compaction try to move usable jobs from oldest binlog to > > current binlog. > > It moves existing jobs. Doesn't matter if the jobs are ready, > delayed, or buried. > > > When a binlog has no more usable jobs, beanstalk remove it > > from filesystem, otherwise binlog stay in place and new jobs are added > on an > > eventually new binlog. > > This is correct. > > > This is very good with jobs that is produced and consumed quickly. > > > > But with long delayed jobs, small binlog size, and high load, it's > possible > > that compaction sparse usable jobs among several files (that seems to be > > your case) because current binlog is too small and compaction fail to > move > > jobs around files. > > Compaction tries to keep the ratio of wasted space to used space > below 2:1. When the ratio is higher than that, it'll move one or more > jobs after every user write operation. The higher the ratio, the more > jobs it moves each time. In other words, it works harder to try to > converge faster when necessary. > > This ratio is simply used space (including all delayed, buried, and > ready jobs) to wasted space (deleted jobs and migrated records). > > If the usable jobs are spread sparsely among several files, that > would imply there is a lot of wasted space, and the existing > compaction code should be working harder to migrate more jobs. > > If that's not happening, it's simply a bug. > > > In my opinion if you really use long delayed jobs, you can try to : > > There should be no need for tricks like these to manage disk > space when you have many delayed jobs, or for any other access > pattern. > > You should simply expect your disk usage to scale linearly (both > upwards and downwards) with the number of jobs in the system at > any given time. I think this is a reasonable and intuitive rule of > thumb: your disk usage is however much space you need to hold > all the jobs in the system, with a constant factor of overhead. > > Again, this is the intended behavior, and it's looking like there is > a bug where it doesn't work properly in this case. > > -- > You received this message because you are subscribed to a topic in the > Google Groups "beanstalk-talk" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/beanstalk-talk/k2r2ZJKoRFM/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/beanstalk-talk. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "beanstalk-talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/beanstalk-talk. For more options, visit https://groups.google.com/d/optout.
