Hi Greg, On Fri, Nov 6, 2015 at 8:43 AM, Greg Sabino Mullane <[email protected]> wrote:
> Recent emails to the list have pointed out a shortcoming in the > way we track replicated rows. Currently, when a row is changed, > a trigger adds a rows to the table's "delta" table, containing > the primary key that was changed, and the current_timestamp of > when the txn was started (txntime). Similarly, the "track" table uses > a timestamptz field named "txntime" to match up with entries in > the delta table - when a delta row has the same number of matching > entries in the track table as number of syncs replicating to > the source table, we know we can remove the delta row. > > The problem is that, even with microsecond resolution, there is > still a chance that two backends could modify the table at > the same time - or at least, get the exact same time recorded > as their transaction start time (which is really time of first > query). This is not good, as the txntime is no longer unique and > no loger suitable for use as a (sort of) primary key. > > We could store more precision in the timestamp, but not only are > we already limited in what can be stored via current_timestamp, we > start having to worry about compile-time settings. > > My idea is to use the PID of the backend as well. In theory, this > should be unique when combined with the timestamp, because even > if two backends manage to grab the exact same timestamp, they will always > be different PIDs. Another process may come along and recycle one > of those PIDs, but that will not matter as the timestamp will have > changed. > > The goals of the txntime column are: > > * unique > * human-readable timestamp > * sortable (esp. for conflict handling) > > One solution is to change it to a text field, then dump the formatted > timestamp plus the PID: > > greg=# select to_char(current_timestamp,'YYYY-MM-dd.HH24:MI:US ') || > pg_backend_pid(); > ?column? > ------------------------------- > 2015-11-06.08:35:460738 10638 > > This seems a pretty good solution, as it meets all the criteria. Having > the PID > at the end means conflict resolution continues to work as well. Although it > means two servers with the exact same timestamp update will have a > non-deterministic conflict winner, the chance of the same timestamp is > extremely low, and in such a case there truly is no way to pick a > winner. > > One concern is that the indexes on such a field will not be very optimal - > we could store the PID and/or the seconds first to create a wider tree, > but then we lose the sorting. I'm not convinced it will be much of a > problem, really, but wanted to throw it out there. > > Feedback welcome. > > In our database, we have 2 big tables and one of them that has about 300 million records has an index on a text field that has 16-20 characters. When I have to restore a backup of the database that index takes 10 times longer than any other index. I know the delta and track table won't get that big so it might not be a problem for this app. How about using 2 fields in the primary key. One the current txntime field and one a serial id field. Thanks for looking into this. Ioana -- > Greg Sabino Mullane [email protected] > End Point Corporation > PGP Key: 0x14964AC8 > > _______________________________________________ > Bucardo-general mailing list > [email protected] > https://mail.endcrypt.com/mailman/listinfo/bucardo-general > >
_______________________________________________ Bucardo-general mailing list [email protected] https://mail.endcrypt.com/mailman/listinfo/bucardo-general
