On Tue, 1 Apr 2014 22:59:36 +0000, "Leyne, Sean" <s...@broadviewsoftware.com> wrote: > All, > > Consider a large database (200GB, large tables with 450 Million rows) > which is running on a kick-a** server with pool of enterprise SSDs for > storage (more IOPS then Firebird could ever use), which I need to extract > data from on a regular basis throughout the day for use by an external BI > system. During the business day, the database is Live with over 300 > connections active with 1 million+ DB transactions. > > I need to extract the BI data as a true "snapshot" of data (ensuring FKs > are valid), in as short a timeframe as possible. > > Because runtime is critical, I want to break the extract process into > logical pieces and run each piece is a separate process/thread (with its > own connection) (aka run in parallel). > > Here is the problem: > > - I need to be able to open the connections/start transactions in such > a way that no other transaction can be committed until all my > transactions have been started. > - I cannot allow for an existing transaction to be committed that could > "invalidate" my snapshot, by making changes visible to a "middle" > transaction which is just about to start. > > As far as I can tell, there is no way for this to be done. > > In my "ideal world": > - I would have each extract process/thread open its own database > connection and pass the connection handle/ID to a central coordinator > process > - This coordinator would then call a new "StartTransactionCoordinated" > method and pass the list of Connection handles/IDs to which the > transactions would be created for > - The engine would then create a transaction for each connection, but > ensuring that no new transaction can be committed during > - The assignment of transaction to connection would be at the > engines discretion > - The transactions would then follow the usual lifetime, committing > independently from each other. > > I think that my requirements and proposed approach/solution could be a > significant benefit for use cases where operations can occur in parallel - > eg. database replication (or data pump) where multiple connections could be > used to extract (or apply) changes/data in parallel. > > > What do folks think? > > Am I dreaming?
Sounds promising, but it might be a bit hard. A better and simpler way might be if you could 'join' an existing transaction (as in: use the same set of visible transactions/record versions). For now you might want to think about: 1) Use NBACKUP to create a backup (and using levels this might not take too much time), although a restore might take more time (not enough experience with nbackup) 2) Use ALTER DATABASE BEGIN BACKUP, copy database file, ALTER DATABASE END BACKUP and do the BI work on the copy. Note that Firebird does allow for starting multiple transactions for multiple database (using isc_start_transaction and isc_start_multiple) but that is intended to start multiple (distributed) transactions on multiple database. I am not sure if and how it would work on a single database, nor if it would guarantee a contiguous set of transactions (probably not). Mark ------------------------------------------------------------------------------ Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel