Hi Tiffany!, list, I think dumps are reasonable for regular backups, but not a good choice for creating more long-term archives; so guess it depends a bit on the goal. Changes in versions, options, encoding, and database engines can make it difficult to import SQL dumps accurately. I think text-file formats (csv) are still the best long-term archive option -- they are easy to version and compress and ubiquitous, but far from a perfect option -- in particular, a round-trip db -> csv -> db may likely not preserve data types (boolean / int / char etc) accurately. Storing this as 'metadata' can help but is somewhat manual. I'm not convinced that we have a good performant, compressable, cross-platform, widely established file-based exchange format available at this time (queue comments about json, hdf5, or parquet).
A somewhat separate issue is whether such files need a git-like tool to manage versions. IMHO the goal is really to preserve each dump in a way that doesn't risk accidental overwriting of a previous version and captures some basic metadata (timestamp); something a file-naming convention can provide and git may not be necessary (given both the potentially large size of data dumps and the often compelling case to compress these files in a binary format). I'm really no expert in any of this though, so sharing this as much to learn where it goes wrong rather than as solid advice! Cheers, Carl On Fri, Aug 10, 2018 at 12:08 PM Bennet Fauber <[email protected]> wrote: > Tiffany, > > You might experiment with some smallish databases. The order of > records may well change significantly from dump to dump, making the > apparent differences and the actual differences between any two dumps > appear much larger than they really are. > > Good luck! > > > On Fri, Aug 10, 2018 at 12:49 PM Tiffany A. Timbers via discuss > <[email protected]> wrote: > > > > Thanks all for your input - very helpful! Dav - happy for you to > questions the general strategy. As I said, I know very little about this. > In my case its a smallish, simple SQLite database with ~ 8 tables. So > dumping/transaction logs/etc might work well and easily. But if there's a > better and different strategy for checkpointing SQLite databases, I'd love > to learn. > > > > Thanks! > > Tiffany > > The Carpentries / discuss / see discussions + participants + delivery > options Permalink > > ------------------------------------------ > The Carpentries: discuss > Permalink: > https://carpentries.topicbox.com/groups/discuss/Ta7250f4266e508c5-Mb0ae3c22005b6cfbf5866889 > Delivery options: > https://carpentries.topicbox.com/groups/discuss/subscription > -- http://carlboettiger.info ------------------------------------------ The Carpentries: discuss Permalink: https://carpentries.topicbox.com/groups/discuss/Ta7250f4266e508c5-Md590217388b52aadab77edf2 Delivery options: https://carpentries.topicbox.com/groups/discuss/subscription
