All,

Consider a large database (200GB, large tables with 450 Million rows) which is 
running on a kick-a** server with pool of enterprise SSDs for storage (more 
IOPS then Firebird could ever use), which I need to extract data from on a 
regular basis throughout the day for use by an external BI system.  During the 
business day, the database is Live with over 300 connections active with 1 
million+ DB transactions.

I need to extract the BI data as a true "snapshot" of data (ensuring FKs are 
valid), in as short a timeframe as possible.

Because runtime is critical, I want to break the extract process into logical 
pieces and run each piece is a separate process/thread (with its own 
connection) (aka run in parallel).

Here is the problem:

   - I need to be able to open the connections/start transactions in such a way 
that no other transaction can be committed until all my transactions have been 
started.
   - I cannot allow for an existing transaction to be committed that could 
"invalidate" my snapshot, by making changes visible to a "middle" transaction 
which is just about to start.

As far as I can tell, there is no way for this to be done.

In my "ideal world":
   - I would have each extract process/thread open its own database connection 
and pass the connection handle/ID to a central coordinator process
   - This coordinator would then call a new "StartTransactionCoordinated" 
method and pass the list of Connection handles/IDs to which the transactions 
would be created for
   - The engine would then create a transaction for each connection, but 
ensuring that no new transaction can be committed during
       - The assignment of transaction to connection would be at the engines 
discretion
   - The transactions would then follow the usual lifetime, committing 
independently from each other.

I think that my requirements and proposed approach/solution could be a 
significant benefit for use cases where operations can occur in parallel - eg. 
database replication (or data pump) where multiple connections could be used to 
extract (or apply) changes/data in parallel.


What do folks think?

Am I dreaming?


Sean


------------------------------------------------------------------------------
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to