Thanks Keith for this excellent explanation.

What do you think about adding statistical information of compactor
(current waste space ratio, number of failed/completed compactions.. and so
on) in a separate command (like stat-binlogs) or by extending current stats
command!?

Cheers

2014-09-12 2:48 GMT+02:00 Keith Rarick <[email protected]>:

> On Thu, Sep 11, 2014 at 2:21 PM, michele zuppala <[email protected]> wrote:
> > I've little experience with Beanstalk then correct me if I was wrong, but
> > I've notice that compaction try to move usable jobs from oldest binlog to
> > current binlog.
>
> It moves existing jobs. Doesn't matter if the jobs are ready,
> delayed, or buried.
>
> > When a binlog has no more usable jobs, beanstalk remove it
> > from filesystem, otherwise binlog stay in place and new jobs are added
> on an
> > eventually new binlog.
>
> This is correct.
>
> > This is very good with jobs that is produced and consumed quickly.
> >
> > But with long delayed jobs, small binlog size, and high load, it's
> possible
> > that compaction sparse usable jobs among several files (that seems to be
> > your case) because current binlog is too small and compaction fail to
> move
> > jobs around files.
>
> Compaction tries to keep the ratio of wasted space to used space
> below 2:1. When the ratio is higher than that, it'll move one or more
> jobs after every user write operation. The higher the ratio, the more
> jobs it moves each time. In other words, it works harder to try to
> converge faster when necessary.
>
> This ratio is simply used space (including all delayed, buried, and
> ready jobs) to wasted space (deleted jobs and migrated records).
>
> If the usable jobs are spread sparsely among several files, that
> would imply there is a lot of wasted space, and the existing
> compaction code should be working harder to migrate more jobs.
>
> If that's not happening, it's simply a bug.
>
> > In my opinion if you really use long delayed jobs, you can try to :
>
> There should be no need for tricks like these to manage disk
> space when you have many delayed jobs, or for any other access
> pattern.
>
> You should simply expect your disk usage to scale linearly (both
> upwards and downwards) with the number of jobs in the system at
> any given time. I think this is a reasonable and intuitive rule of
> thumb: your disk usage is however much space you need to hold
> all the jobs in the system, with a constant factor of overhead.
>
> Again, this is the intended behavior, and it's looking like there is
> a bug where it doesn't work properly in this case.
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "beanstalk-talk" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/beanstalk-talk/k2r2ZJKoRFM/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/beanstalk-talk.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"beanstalk-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/beanstalk-talk.
For more options, visit https://groups.google.com/d/optout.

Reply via email to