On Mon, Jan 16, 2012 at 8:24 AM, u.maso...@libero.it
<u.maso...@libero.it> wrote:
>
> I remember that in 8086 era, when addressable memory became cheaper and bigger
> than dimension reserved in opcode space (24 bits IIRC), hardware architecture
> changed a bit, implementing relocation tables, where low level bits where the
> displacement in a page and higher bits instead where used as a pointer in a
> page table containing the base address of the real memory page.


A relocation table for transactions is an interesting idea.  Let me
explain how transaction
identifiers are used.  It's very simple, and, given that transaction
numbers generally go up
over time, some major simplification is possible.

Record versions are identified by transaction number of the
transaction that created them.

Transactions that don't insert, update, or delete records are of no
interest to anyone once
they end.  The long term value in transaction identifiers is only to
identify record versions.

Transactions have four states: committed, rolled back, limbo, and
active.  The first two
are pretty obvious.

Limbo is the state that a two-phase commit enters when it has
agreed that it can rollback or commit as necessary, with no
possibility of error in either
path.  Normally, transactions are in limbo very briefly until their
partner transactions all
agree on a direction and commit or rollback.  The problem that
two-phase commit solves
is a system crash during a commit on several sites.  If that happens
some of the sites
may have committed and the other sites must commit the transactions in
limbo when
they recover.

Active is more interesting than it looks.  When allocating a new
transaction inventory
page, Firebird sets every transaction to active.  When Firebird
restarts after a crash or
shutdown it searches the transaction inventory pages for transactions
between the
oldest active and the next transaction.  If it finds any transactions
in the active state,
it changes them to rolled back.

This leads to a couple of simplifying assumptions.

When a transaction starts, it can be certain that all transactions with higher
transaction numbers are active.

Transactions keep track of the state of older transactions down to the
"oldest interesting"
which is the first transaction left in the database that did not
commit. Sweeping the database
moves the "oldest interesting" by removing changes made by
transactions that rolled back.

Older record versions have lower transaction numbers than newer
versions.  Figuring out
what record version to use is pretty easy.  Find the first version
that was committed when
the reading transaction started.

>
> As a parallel in our case, not many transaction IDs are important, interesting
> or whatever you want call them, between 0 and next transaction number, a lot 
> of
> them could be ... "forgotten".

OK.  So how would you do a remap?

Reusing the identifiers of transactions that created record versions
that are still in the
database is a bad idea.   Resetting all the transaction identifiers of
mature records (records
without back versions) is possible but would require a mega-sweep
(vacuum?) that would
be slow, and, unfortunately, cannot reset the most recent transaction
identifiers.


However, the final state of transactions that don't make changes is
kept on transaction inventory
pages.  Those numbers could be reused if those transactions could be
identified and if the code
could be changed so a new transaction 128 understood that transactions
129 - 4,000,000,000
are committed and that changes by transaction 100 are newer than
changes by transaction
3,999,999,970, but that changes by transaction 3 are older.

>
> I know that transaction management is not implemented in hardware, but having
> this in memory could help and shouldn't be very big in size.

I think a relocation table between unused transaction numbers in the
32-bit range
to virtual transaction numbers in the 64 bit range could be pretty
big.  And it would
have to be permanent, durable, on disk, and current.

> AFAIU many classes uses transaction information and probably transaction ID
> relocation algebra should be overloaded in the transaction class (don't know 
> if
> are there any such class, it's completely a hypothetical reasoning).

The relative age of record versions is critically important to any
MVCC system.  The
"relocation algebra" has to be permanent.

> With a management like this, are any chance to re-compact them? If a page of
> "relocatable IDs" has no more "pending" transaction, should be discarded and
> the next can downgrade to occupy the freed "address space": this maybe needs a
> use count, with the good and the bad it has.

Err, those transaction identifiers are written into record versions on
disk.  They
can't be released without rewriting the pages.

> Anyway, as are they are implemented now, it's not a simple task, because, and
> maybe here i'm very wrong, they are used for different purposes: mark 
> different
> versions of a record as they are created AND time ordering any correlated on
> unrelated change in DDL or DML of the database (AND maybe other things I don't
> remember now or I'm not aware).

No, there's no correlation between transaction id's and DDL operations.  Changes
to table definitions are managed with record format versions.

Having thought about it for a minute, relocation of transaction
identifiers sounds hard,
error prone, and likely to lead to slower and slower operations as the
number of available
identifiers shrinks.

Cheers,

Ann

------------------------------------------------------------------------------
RSA(R) Conference 2012
Mar 27 - Feb 2
Save $400 by Jan. 27
Register now!
http://p.sf.net/sfu/rsa-sfdev2dev2
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to