No recommendation here, but I just came across this tool for including/diffing sqlite binaries in git repos <https://github.com/cannadayr/git-sqlite>, which is clearly relevant.
(It came up in a HN discussion for another related application <https://news.ycombinator.com/item?id=17865687>.) On Fri, Aug 10, 2018 at 1:28 PM Carl Boettiger via discuss < discuss@lists.carpentries.org> wrote: > Hi Tiffany!, list, > > I think dumps are reasonable for regular backups, but not a good choice > for creating more long-term archives; so guess it depends a bit on the > goal. Changes in versions, options, encoding, and database engines can > make it difficult to import SQL dumps accurately. I think text-file > formats (csv) are still the best long-term archive option -- they are easy > to version and compress and ubiquitous, but far from a perfect option -- in > particular, a round-trip db -> csv -> db may likely not preserve data types > (boolean / int / char etc) accurately. Storing this as 'metadata' can help > but is somewhat manual. I'm not convinced that we have a good performant, > compressable, cross-platform, widely established file-based exchange format > available at this time (queue comments about json, hdf5, or parquet). > > A somewhat separate issue is whether such files need a git-like tool to > manage versions. IMHO the goal is really to preserve each dump in a way > that doesn't risk accidental overwriting of a previous version and captures > some basic metadata (timestamp); something a file-naming convention can > provide and git may not be necessary (given both the potentially large size > of data dumps and the often compelling case to compress these files in a > binary format). > > I'm really no expert in any of this though, so sharing this as much to > learn where it goes wrong rather than as solid advice! > > Cheers, > > Carl > > On Fri, Aug 10, 2018 at 12:08 PM Bennet Fauber <ben...@umich.edu> wrote: > >> Tiffany, >> >> You might experiment with some smallish databases. The order of >> records may well change significantly from dump to dump, making the >> apparent differences and the actual differences between any two dumps >> appear much larger than they really are. >> >> Good luck! >> >> >> On Fri, Aug 10, 2018 at 12:49 PM Tiffany A. Timbers via discuss >> <discuss@lists.carpentries.org> wrote: >> > >> > Thanks all for your input - very helpful! Dav - happy for you to >> questions the general strategy. As I said, I know very little about this. >> In my case its a smallish, simple SQLite database with ~ 8 tables. So >> dumping/transaction logs/etc might work well and easily. But if there's a >> better and different strategy for checkpointing SQLite databases, I'd love >> to learn. >> > >> > Thanks! >> > Tiffany >> > The Carpentries / discuss / see discussions + participants + delivery >> options Permalink >> >> ------------------------------------------ >> The Carpentries: discuss >> Permalink: >> https://carpentries.topicbox.com/groups/discuss/Ta7250f4266e508c5-Mb0ae3c22005b6cfbf5866889 >> Delivery options: >> https://carpentries.topicbox.com/groups/discuss/subscription >> > -- > > http://carlboettiger.info > *The Carpentries <https://carpentries.topicbox.com/latest>* / discuss / > see discussions <https://carpentries.topicbox.com/groups/discuss> + > participants <https://carpentries.topicbox.com/groups/discuss/members> + > delivery > options <https://carpentries.topicbox.com/groups/discuss/subscription> > Permalink > <https://carpentries.topicbox.com/groups/discuss/Ta7250f4266e508c5-Md590217388b52aadab77edf2> > ------------------------------------------ The Carpentries: discuss Permalink: https://carpentries.topicbox.com/groups/discuss/Ta7250f4266e508c5-M5d95bbf9e8825d3a09fe7cb8 Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription