On Sat, Nov 16, 2019 at 4:08 PM T J C (JIRA) <trac...@firebirdsql.org>
wrote:

gbak does not do sweep unless sweep threshold exceeded
>
>
> I had understoon that a backup would reset the Oldest Interesting
> Transaction (OIT)  as well as
> the Oldest Snapshot (OST) and had set the sweep interval to zero, and was
> doing nightly backups
> (see bottom for documentation issue)
>

I'm afraid you were confused.  Gbak does remove all the unneeded record
versions, deleted records,
and remnants of  failed transactions, which is the major benefit of a
sweep.  It never changes the
values of the OIT or OST on the header page.  The reason for that is
historic.  Gbak is a user level
application - reasonably smart, but user level.  Sweep is actually a
setting on an attachment and
as such it can do magic things like resetting values on the header page.

What is the OIT and why should anybody care.  Firebird maintains a bit for
every transaction that's
 ever been started.  If the bit is on, the transaction committed  If it's
off, the transaction is active or dead.
The first bit that's off is the Oldest Interesting Transaction.  What makes
it interesting?  Every transaction
older than that is committed - meaning that when a record is read and it's
transaction id is found to be
older than the OIT, it's good data.  Records with newer transaction ids
have to be checked against a bit
vector of transaction states to verify that their data was committed.

InterBase was created in the mid 1980's on computers that wouldn't power a
modern parking
meter.  Conserving memory was essential then, much less so now.
Maintaining a long bit vector of
transaction states cost a lot of memory, especially since the original
Firebird mode was Classic (can
you imagine?) and every connection had a copy of the bit vector.

Maintaining an up-to-date OIT is much less important now that computers
have thousands of times
more memory and in server mode, Firebird maintains one bit vector of
transaction states per database,
not one per connection.

Another change is the way failed transactions are handled.  Until about 20
years ago, the memory
cost of maintaining enough state to undo a transaction on rollback -
whether deliberate or through
a failed connection - was unsupportable.  When Firebird added savepoints,
it suddenly had everything
necessary to back out a failed transaction.  As far as I know, the only
time that a transaction is left
in failed/active state is after a crash - server, O/S, computer - which
keeps the clean up from happening.
So there just aren't as many problematic transactions as there were back in
the day.

So you're probably OK largely ignoring the OIT.  Run a sweep after a backup
once in a while and don't
worry about it much.   The Oldest Snapshot Transaction is important and
should be kept up to date, but
that's a question of transaction maintenance that neither sweep nor gbak
will affect.

>
> I have since found that when the sweep interval is set to zero, no sweep
> is done during a gbak backup
> regardless of the -g parameter. I would expect the sweep functionality and
> setting the OIT would be done
> when garbage collection is done.
>

The major sweep functionality is removing old record versions, deleted
records, and records created by
failed transactions. Gbak does that.  It just doesn't change the OIT.

>
> Of interest however, if the sweep interval is set below the (OST - OIT)
> immediately before the
> backup is done, then a gbak backup *DOES* do the sweep set the OIT value,
> so I assume
> the sweep is being done.
>

That's because an actual sweep was triggered.

>
> In my opinion, a gbak backup should be doing a sweep, regardless of the
> sweep interval, unless the -g option is specified.
>

You are entitled to your opinion.  It just doesn't align with the design of
gbak.

>
>
> It seems to me that this is at the least this is a documentation issue
> See https://firebirdsql.org/manual/gfix-housekeeping.html


Err.  That article has a couple of problems - some related to change since
1985 and some
an apparent confusion.  The major source of garbage in the database is old
record versions
that are not revisited, including deleted records.  More on that some other
time.

Good luck,

Ann
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to