Re: [sqlite] Synchronising multiple copies of a database
Jean-Denis Muys wrote: > On 10/7/09 21:35 , "Adam DeVita"wrote: > > >> One can also get a mess if Mr. Red and Mr Black both get new customers, and >> enter them and they both get the same ID because the auto-generated int >> happens to be the same. Both copies get updated with the other guy's data, >> they then get annoyed and enter the stuff again and it happens over again, >> but now there are N entries of the other guy's customer in the database >> depending on how many times they do it. >> >> > > I solve this problem the following way: > > New records ID is not set by the local client who creates the new record, > but by the central database. Another solution is to use a GUID instead of an integer. John ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Synchronising multiple copies of a database
On 10/7/09 21:35 , "Adam DeVita"wrote: > One can also get a mess if Mr. Red and Mr Black both get new customers, and > enter them and they both get the same ID because the auto-generated int > happens to be the same. Both copies get updated with the other guy's data, > they then get annoyed and enter the stuff again and it happens over again, > but now there are N entries of the other guy's customer in the database > depending on how many times they do it. > I solve this problem the following way: New records ID is not set by the local client who creates the new record, but by the central database. Upon new record creation a "trigger" is run that sends the new record to the central database. The central database then allocates the new id (and revision stamp) to the new record and returns them to the trigger which inserts them back into to local repository. This only works in an "online" scenario clearly. An alternative is to guarantee uniqueness across clients by making sure the newly allocated id is a composite that includes the creator client identity. Jean-Denis ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Synchronising multiple copies of a database
Simon Slavin wrote: > On 7 Oct 2009, at 8:33pm, John Elrick wrote: > > >> Isn't this a variation of the DVCS problem? In other words, would it >> be correct in saying that the underlying issue is treating this as a >> database problem, rather than it being a versioning problem which >> happens to involve a database? >> > > Yes yes yes. And look how complicated the programming is to do DVCS > correctly. > I would think the programming is relatively easy...the hard part is getting the rules right and being confident you've covered all the edge cases. > And now the problem is ... What is the resolution of your > versioning ? Do put a version number on each row, or do you consider > each column in each row to need its own version number ? What if one > site changes one column and another changes another column of the same > row ? How many version numbers do you want to store for just one > table ? > As many as necessary, but no more John ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Synchronising multiple copies of a database
On 7 Oct 2009, at 8:33pm, John Elrick wrote: > Isn't this a variation of the DVCS problem? In other words, would it > be correct in saying that the underlying issue is treating this as a > database problem, rather than it being a versioning problem which > happens to involve a database? Yes yes yes. And look how complicated the programming is to do DVCS correctly. And now the problem is ... What is the resolution of your versioning ? Do put a version number on each row, or do you consider each column in each row to need its own version number ? What if one site changes one column and another changes another column of the same row ? How many version numbers do you want to store for just one table ? Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Synchronising multiple copies of a database
"You have to be really careful" Absolutely. Even if you know the order of updates (which I do). If site A updates an off line record in a cached copy after site B deletes it other sites can receive the change records in order and have the record re-appear (via insert or replace). One can also get a mess if Mr. Red and Mr Black both get new customers, and enter them and they both get the same ID because the auto-generated int happens to be the same. Both copies get updated with the other guy's data, they then get annoyed and enter the stuff again and it happens over again, but now there are N entries of the other guy's customer in the database depending on how many times they do it. On Wed, Oct 7, 2009 at 3:18 PM, Simon Slavinwrote: > > On 7 Oct 2009, at 7:20pm, Adam DeVita wrote: > > > regarding this > > " The fault is that > > almost nobody does it right: they neglect to keep an 'unaltered > > central copy' and think they can cross-apply journals each time two > > databases talk to one-another. That does not work for various > > reasons." > > > > Would a central repository of journals that can be applied to local > > repositories be sufficient? I suppose I assume that running the same > > program on N workstations with the same set of journals should > > produce N > > identical results. > > You need a copy of the database which is not changed by any site. All > the sites send in their journals. The journals are merged into a > superjournal in time order. The superjournal is then applied to the > central copy of the database. Then the updated database is sent back > out to all sites. > > The problem comes when you apply multiple journals in a different > order. Start with each site with identical copies of a TABLE with > three clients: one managed by Mr. Green, one by Mr. Red, and one by > Mr. Black. 'G R B'. Then, in this order ... > > Mr. Green goes on holiday ... > Site A says that all Mr. Green's customers will be handled by Mr. Red. > UPDATE clients SET contact = 'red' WHERE contact = 'green' > > Mr. Red goes on holiday ... > Site B says that all Mr. Red's customers will be handled by Mr. Black. > > Then Mr. Green comes back from holiday, and Mr. Black goes on holiday > so ... > Site C says that all Mr. Black's customers will be handled by Mr. Green. > > Then they all synchronise databases. See if you can make them all end > up with the same data if they synch against each-other rather than a > central unaltered copy of the databases. Doesn't work: one site might > have 'B B B', another 'R B R'. You can do it only by luck ... by > happening to know in which order people went on holiday. However, if > you always synch against a central unaltered copy of the database you > can synch in any order. Once everyone has synchronised you distribute > a copy of the central database to everyone and they all have identical > data once more. That's the simplest setup. You can get more > complicated by having each site remember which journals they've played > back. > > The problem does not occur if any record can only ever be modified by > one site. But if you have the normal 'anyone can do anything' setup, > you have to be really really careful. > > Simon. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- VerifEye Technologies Inc. 905-948-0015x245 7100 Warden Ave, Unit 3 Markham ON, L3R 8B5 Canada ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Synchronising multiple copies of a database
Simon Slavin wrote: > On 7 Oct 2009, at 7:20pm, Adam DeVita wrote: > > >> regarding this >> " The fault is that >> almost nobody does it right: they neglect to keep an 'unaltered >> central copy' and think they can cross-apply journals each time two >> databases talk to one-another. That does not work for various >> reasons." >> >> Would a central repository of journals that can be applied to local >> repositories be sufficient? I suppose I assume that running the same >> program on N workstations with the same set of journals should >> produce N >> identical results. >> > > SNIP > The problem comes when you apply multiple journals in a different > order. Start with each site with identical copies of a TABLE with > three clients: one managed by Mr. Green, one by Mr. Red, and one by > Mr. Black. 'G R B'. Then, in this order ... > > Mr. Green goes on holiday ... > Site A says that all Mr. Green's customers will be handled by Mr. Red. > UPDATE clients SET contact = 'red' WHERE contact = 'green' > > Mr. Red goes on holiday ... > Site B says that all Mr. Red's customers will be handled by Mr. Black. > > Then Mr. Green comes back from holiday, and Mr. Black goes on holiday > so ... > Site C says that all Mr. Black's customers will be handled by Mr. Green. > > Then they all synchronise databases. See if you can make them all end > up with the same data if they synch against each-other rather than a > central unaltered copy of the databases. Doesn't work: one site might > have 'B B B', another 'R B R'. You can do it only by luck ... by > happening to know in which order people went on holiday. However, if > you always synch against a central unaltered copy of the database you > can synch in any order. Once everyone has synchronised you distribute > a copy of the central database to everyone and they all have identical > data once more. That's the simplest setup. You can get more > complicated by having each site remember which journals they've played > back. > Simon, Isn't this a variation of the DVCS problem? In other words, would it be correct in saying that the underlying issue is treating this as a database problem, rather than it being a versioning problem which happens to involve a database? I ask because there are two separate projects which involve this sort of issue which I have simply deferred for the time being. John ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Synchronising multiple copies of a database
On 7 Oct 2009, at 7:20pm, Adam DeVita wrote: > regarding this > " The fault is that > almost nobody does it right: they neglect to keep an 'unaltered > central copy' and think they can cross-apply journals each time two > databases talk to one-another. That does not work for various > reasons." > > Would a central repository of journals that can be applied to local > repositories be sufficient? I suppose I assume that running the same > program on N workstations with the same set of journals should > produce N > identical results. You need a copy of the database which is not changed by any site. All the sites send in their journals. The journals are merged into a superjournal in time order. The superjournal is then applied to the central copy of the database. Then the updated database is sent back out to all sites. The problem comes when you apply multiple journals in a different order. Start with each site with identical copies of a TABLE with three clients: one managed by Mr. Green, one by Mr. Red, and one by Mr. Black. 'G R B'. Then, in this order ... Mr. Green goes on holiday ... Site A says that all Mr. Green's customers will be handled by Mr. Red. UPDATE clients SET contact = 'red' WHERE contact = 'green' Mr. Red goes on holiday ... Site B says that all Mr. Red's customers will be handled by Mr. Black. Then Mr. Green comes back from holiday, and Mr. Black goes on holiday so ... Site C says that all Mr. Black's customers will be handled by Mr. Green. Then they all synchronise databases. See if you can make them all end up with the same data if they synch against each-other rather than a central unaltered copy of the databases. Doesn't work: one site might have 'B B B', another 'R B R'. You can do it only by luck ... by happening to know in which order people went on holiday. However, if you always synch against a central unaltered copy of the database you can synch in any order. Once everyone has synchronised you distribute a copy of the central database to everyone and they all have identical data once more. That's the simplest setup. You can get more complicated by having each site remember which journals they've played back. The problem does not occur if any record can only ever be modified by one site. But if you have the normal 'anyone can do anything' setup, you have to be really really careful. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Synchronising multiple copies of a database
regarding this " The fault is that almost nobody does it right: they neglect to keep an 'unaltered central copy' and think they can cross-apply journals each time two databases talk to one-another. That does not work for various reasons." Would a central repository of journals that can be applied to local repositories be sufficient? I suppose I assume that running the same program on N workstations with the same set of journals should produce N identical results. On Wed, Oct 7, 2009 at 12:16 PM, Simon Slavinwrote: > > On 7 Oct 2009, at 1:47pm, Jean-Denis Muys wrote: > > > On 10/7/09 11:50 , "Simon Slavin" wrote: > > > >> Try really really hard just to have all sites access your MySQL > >> database remotely. > > > > Unfortunately this approach is not possible in the short term. The > > client > > applications are legacy applications, porting them to that scheme is > > a major > > undertaking. [snip] > > I completely understand. The recommendation is valuable in the > general case, but useless in yours. Still, that's why they pay you > the big bucks: to write the complicated program. > > >> Keep a journal. Keep an unaltered central copy of the data. As each > >> site contacts the central site, play that sites journal back against > >> the unaltered central copy. The post-journal central copy of the > >> database becomes the new copy for distribution. > > > > Interesting idea, that makes a lot of sense in the "offline" scenario. > > Standard solution to the synchronisation problem. The fault is that > almost nobody does it right: they neglect to keep an 'unaltered > central copy' and think they can cross-apply journals each time two > databases talk to one-another. That does not work for various reasons. > > The synchronisation service built into Mac OS X (e.g. synchronising > with online services or an iPhone/iPod) implements it in the correct > manner. It takes extra data space and fussy programming but it does > at least work right ! > > > [snip] In any case, any book reference on this topic? > > Since I joined this list and noticed repeated questions on the subject > I have been trying hard to find any book with anything significant to > say on the issue. I failed: everything I found was lacking in some > way. Some were flat-out wrong. I work at a university and I think > I'm going to ask the Computing people to find me someone who knows > this stuff. I'm just paid to do it in real life, not read or write > books about it. If I find something good I'll read it and post here > about it. > > Simon. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > -- VerifEye Technologies Inc. 905-948-0015x245 7100 Warden Ave, Unit 3 Markham ON, L3R 8B5 Canada ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Synchronising multiple copies of a database
On 7 Oct 2009, at 1:47pm, Jean-Denis Muys wrote: > On 10/7/09 11:50 , "Simon Slavin"wrote: > >> Try really really hard just to have all sites access your MySQL >> database remotely. > > Unfortunately this approach is not possible in the short term. The > client > applications are legacy applications, porting them to that scheme is > a major > undertaking. [snip] I completely understand. The recommendation is valuable in the general case, but useless in yours. Still, that's why they pay you the big bucks: to write the complicated program. >> Keep a journal. Keep an unaltered central copy of the data. As each >> site contacts the central site, play that sites journal back against >> the unaltered central copy. The post-journal central copy of the >> database becomes the new copy for distribution. > > Interesting idea, that makes a lot of sense in the "offline" scenario. Standard solution to the synchronisation problem. The fault is that almost nobody does it right: they neglect to keep an 'unaltered central copy' and think they can cross-apply journals each time two databases talk to one-another. That does not work for various reasons. The synchronisation service built into Mac OS X (e.g. synchronising with online services or an iPhone/iPod) implements it in the correct manner. It takes extra data space and fussy programming but it does at least work right ! > [snip] In any case, any book reference on this topic? Since I joined this list and noticed repeated questions on the subject I have been trying hard to find any book with anything significant to say on the issue. I failed: everything I found was lacking in some way. Some were flat-out wrong. I work at a university and I think I'm going to ask the Computing people to find me someone who knows this stuff. I'm just paid to do it in real life, not read or write books about it. If I find something good I'll read it and post here about it. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Synchronising multiple copies of a database
On 10/7/09 11:50 , "Simon Slavin"wrote: > > On 7 Oct 2009, at 10:13am, Jean-Denis Muys wrote: > >> [setup description] >> >> I'd appreciate some feedback here or >> pointers to litterature. > Try really really hard just to have all sites access your MySQL > database remotely. Unfortunately this approach is not possible in the short term. The client applications are legacy applications, porting them to that scheme is a major undertaking. I need to insert my code within the legacy applications only in a way that as little invasive as possible. This trigger/polling approach was deemed a fair compromise. Overtime, new appls will probably be developped that work directly against the central database, and the legacy apps will be phased out progressively. Also those legacy apps also work when offline, which is a strong incentive to keep them (and I know this opens a new kind of can of worms). Finaly the central server is one or two stellar systems away, with bad latency and throughput. The local data repository in a way acts as a cache to keep the apps responsive. > > Keep a journal. Keep an unaltered central copy of the data. As each > site contacts the central site, play that sites journal back against > the unaltered central copy. The post-journal central copy of the > database becomes the new copy for distribution. Interesting idea, that makes a lot of sense in the "offline" scenario. > [snip scary scenarios] > > If your system deals with those, it's most of the way there. > I'll have to handle those cases very carefuly. Clearly, when two clients modify the global state in an incompatible way, some kind of conflict resolution must happen (similar to what Version Control Systems do). The proviso here is to make sure these conflicts are at least detected. I need to think about all this some more. Thanks a lot for your very valuable feedback. In any case, any book reference on this topic? Jean-Denis ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Synchronising multiple copies of a database
On 7 Oct 2009, at 10:13am, Jean-Denis Muys wrote: > [setup description] > > I'd appreciate some feedback here or > pointers to litterature. You see how complicated you had to get before you had an acceptable solution ? Two extra columns ? Timestamps ? Retention of records which have to be ignored ? I see nothing obvious wrong, so here are some general comments. Try really really hard just to have all sites access your MySQL database remotely. MySQL is an excellent client/server version of SQL. You could kill all the complexity you had to invent by just having everyone access your database live. Not only is the programming simpler, but you can back everything up in one go, and everyone has completely up-to-date data. If this solution cannot be implemented for you, try Keep a journal. Keep an unaltered central copy of the data. As each site contacts the central site, play that sites journal back against the unaltered central copy. The post-journal central copy of the database becomes the new copy for distribution. If these just cannot be done, you have to implement your own solution. Check that it handles these nasty scenarios for synchronisation: One site deletes a record then another site modifies it. e.g. Branch 1 sez "Our supplier withdrew the product 'Flinns' and created a new one."; Branch 2 sez "Our supplier renamed their 'Flinns' product to be 'Flanns' because 'Flinns' is a swearword in Hijinks. Each site then continues selling the new product. Headquarters considers these the same product, and wants sales figures added up, not considered separate. Superseding UPDATEs. Starting with three cars: black, red and green, one site resprays all black cars red and another site resprays all red cars green. After synchronising, depending on which update you get first you may or may not end up with a red car. Competing UPDATE and INSERT. One site creates new product for category L. Then Headquarters sends out a memo that category L is now category Q but it doesn't know about the new product yet. The site's copy of the database gets accurate data, but since Headquarters got the UPDATE first, then the INSERT, it somehow still has a category L product. If your system deals with those, it's most of the way there. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users