Hello! Ludovic Courtès <[email protected]> writes:
> Hi, > > Maxim Cournoyer <[email protected]> skribis: > >> $ sudo ps -eF | grep guix-daemon >> root 25193 216 0 3074 1524 3 Jun28 ? 00:00:00 >> /gnu/store/vphx2839xv0qj9xwcwrb95592lzrrnx7-guix-1.3.0-3.50dfbbf/bin/guix-daemon >> 25178 guixbuild --max-silent-time 0 --timeout 0 --log-compression >> none --discover=no --substitute-urls http://127.0.0.1:8080 >> https://ci.guix.gnu.org --max-jobs=4--8<---------------cut >> here---------------end--------------->8--- >> >> I can rather easily (and annoyingly!) trigger the problem (and a few >> variations of it, it seems) with something like: >> >> $ packages=$(guix refresh -l protobuf | sed 's/^.*: //') >> $ guix build -v3 --keep-going $packages >> >> For example, running the above, I just got: >> >> guix build: error: corrupt input while restoring archive from #<closed: >> file 7fc95acfc2a0> >> --8<---------------cut here---------------end--------------->8--- >> >> Does the above commands succeed on the first time on your end? If you >> have already lots of things cached, you can try for an architecture you >> don't often build for by adding the '--system=i686-linux' option; that >> should cause a massive amount of downloads, likely to trigger the >> problem. Perhaps also try to use --max-jobs=4. > > I’ve tried that, with --max-jobs=4, and it fills my disk just fine. :-/ > >> If you have ideas of how to debug this when I hit the issue I'm all ears >> :-). > > The attached patch substitutes a number of store items in a row; run: > > guix repl -- substitute-stress.scm > > and it’ll fill /tmp/substitute-test with 200 substitutes, which should > be equivalent to the kind of stress test you had above. > > It doesn’t crash for me. There are a few “error: no valid substitute > for /gnu/store/…” errors, but these are expected: was ask for > substitutes for 200 packages without first checking whether substitutes > are available. > > Could you run it and report back? > > You can try with more packages, different substitute URLs, etc. > > TIA! > > Ludo’. [...] I've tried with the following modified version which runs multiple threads in parallel (to mimic --max-jobs=4 on the daemon), and I've yet to trigger it, although the hard drive is grinding heavily: --8<---------------cut here---------------start------------->8--- (use-modules (guix) (gnu packages) (guix scripts substitute) (guix grafts) (guix build utils) (srfi srfi-1) (ice-9 match) (ice-9 threads)) (define test-directory "/tmp/substitute-test") (define max-jobs 4) (define packages ;; Subset of packages for which we request substitutes. (append (map specification->package '("libreoffice" "ungoogled-chromium" "openjdk" "texmacs")) (take (fold-packages cons '()) 1000))) (define (spawn-substitution-thread input urls) "Spawn a 'guix substitute' thread that reads commands from INPUT and uses URLS as the substitute servers." (call-with-new-thread (lambda () (parameterize ((%reply-file-descriptor #f) (current-input-port input)) (setenv "_NIX_OPTIONS" (string-append "substitute-urls=" (string-join urls))) (let loop () (format (current-error-port) "starting substituter~%") ;; Catch "no valid substitute" errors. (catch 'quit (lambda () (guix-substitute "--substitute")) (const #f)) (unless (eof-object? (peek-char input)) (loop))))))) (for-each (lambda (job) (match (pipe) ((input . output) (let ((test-directory* (string-append test-directory "-" (number->string job))) (thread (spawn-substitution-thread input %default-substitute-urls))) ;; Remove the test directory. (when (file-exists? test-directory*) (for-each (lambda (f) (false-if-exception (make-file-writable f))) (find-files test-directory #:directories? #t)) (delete-file-recursively test-directory*)) (mkdir-p test-directory*) (parameterize ((%graft? #false)) (with-store store ;; Ask for substitutes for PACKAGES. (for-each (lambda (package n) (define item (run-with-store store (package-file package))) (format output "substitute ~a ~a/~a~%" item test-directory* n)) packages (iota (length packages)))) (format #t "sent ~a substitution requests...~%" (length packages)) (close-port output) ;; Wait for substitution to complete. (join-thread thread)))))) (iota max-jobs)) --8<---------------cut here---------------end--------------->8--- I wonder if there's something more happening in the real scenario (validating signatures when putting things in the store? or something similar) that may have a role in the failure. That's a tough nut to crack! I'll keep looking for clues. Thanks for your time! Maxim
