On Jan 27, 2018, at 12:20 AM, Martin Vahi <[email protected]> wrote: > > https://temporary.softf1.com/2018/bugs/2018_01_26_Fossil_out_of_RAM_bug_test_case_t1.tar.xz
So after about 8 hours and 3 restarts, your tarball finally downloaded… and inside, I found another 17 TB tarball! That gives us an easy workaround: unpack the tarball into a subdirectory of your repository checkout and check that subdirectory in. Then the hundreds or thousands of files in that tarball will each be inserted into the DB separately, so you won’t run out of memory. I didn’t actually try your test case because it’s far from minimal. As I recall, it was something like a 200 line shell script. I wasn’t going to take the time to audit all that code just to try your test. Using clues from that tarball’s contents, I found your Fossil repository, whose name I now forget, but while poking around in its Files section, I saw a lot of this sort of thing: 1. I saw not just other tarballs already checked in, but *compressed* tarballs (.tar.xz), which means that if just a single byte in one of the contents of that tarball was modified and checked back in, almost the entire contents following that change point would change, a terrible waste of space. Fossil not only already has compression, but it also has *delta* compression, meaning that if you’d left that tarball uncompressed, it wouldn’t be much bigger inside Fossil, and also, new versions of that tarball would be stored with minimal size inflation. As a rule, you should not check any compressed artifact into Fossil if there is any chance that it will ever be updated later. Doing so defeats the delta compression algorithm. (And if it’s checked in just once, ever, you might want to be using Fossil’s unversioned files feature.) This rule affects many file types besides the ones you immediately think of. For example, I recall seeing at least one PDF. I didn’t check, but chances are excellent that it was compressed, so that checking in an updated version will create an extra-large delta. Decompressing the PDF before checking it in will result in a net smaller Fossil repository if you ever check in a change to that PDF. 2. I also saw a Git checkout inside your Fossil repository. This means you’re checking in two copies of all files at the tip of the repository branch you happen to have checked out of the Git repo when you checked that Git repo into Fossil. If you wanted the complete history of the remote Git repo, checking it in in Git fastexport format would have been more efficient. Personally, whenever I feel the need to re-host someone else’s Git repository inside my Fossil repository, I write a script that merges in the tip of the remote Git repo into the Fossil subdirectory that hosts it. My repo therefore only initially hosts the tip of the remote Git branch I’m checking out, and on each update, I check in only the diffs since the last update. The vast majority of the remote project’s history I delegate to the remote Git repo. If I felt the need to maintain a duplicate copy of the entire remote repository, I’d do it outside my Fossil repository. _______________________________________________ fossil-users mailing list [email protected] http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

