Hi, fellow Schemers!

I've now added support for ZIP files to Dorodango[0], using the ZIP
library from Göran's industria collection[1]. To that end, I've prepared
the following patches for industria:

- Add a SRFI-43-compliant `vector-copy', as a temporary fix as long as
  (srfi :43 vectors) is still broken on Ypsilon. It would be cool if
  some of you syntax-case wizards out there could look at this issue[2].

From: Andreas Rottmann <[email protected]>
Subject: [PATCH] Fix `vector-copy' in zip compression library

---
 weinholt/compression/zip.sls |   14 +++++++++++++-
 1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/weinholt/compression/zip.sls b/weinholt/compression/zip.sls
index 7da16da..74ad12d 100644
--- a/weinholt/compression/zip.sls
+++ b/weinholt/compression/zip.sls
@@ -98,7 +98,19 @@
 
   (define (print . x) (for-each display x) (newline))
 
-  (define (vector-copy x) (vector-map (lambda (i) i) x))
+  (define vector-copy
+    (case-lambda
+      ((vec start end fill)
+       (let ((result (make-vector (- end start) fill)))
+         (do ((i (- (min (vector-length vec) end) 1) (- i 1)))
+             ((< i start) result)
+           (vector-set! result (- i start) (vector-ref vec i)))))
+      ((vec start end)
+       (vector-copy vec start end #f))
+      ((vec start)
+       (vector-copy vec start (vector-length vec) #f))
+      ((vec)
+       (vector-copy vec 0 (vector-length vec) #f))))
 
   (define compression-stored 0)
   (define compression-shrunk 1)
-- 
tg: (43909de..) t/fix-copy-vector (depends on: master)
- Add a procedure `extract-to-port' to the ZIP library, which I need in
  Dorodango:

From: Andreas Rottmann <[email protected]>
Subject: [PATCH] Add `extract-to-port' procedure to zip library

---
 weinholt/compression/zip.sls |   33 ++++++++++++++++++---------------
 1 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/weinholt/compression/zip.sls b/weinholt/compression/zip.sls
index 7da16da..fe1e17d 100644
--- a/weinholt/compression/zip.sls
+++ b/weinholt/compression/zip.sls
@@ -80,6 +80,7 @@
           get-central-directory
           central-directory->file-record
           extract-file
+          extract-to-port
           append-file
           append-port
           append-central-directory
@@ -582,7 +583,6 @@
   (define (extract-file port local central)
     (assert (file-record? local))
     (assert (central-directory? central))
-    (set-port-position! port (file-record-data-port-position local))
     (call-with-adorned-output-file
      (central-directory-filename central)
      (central-directory-date central)
@@ -592,20 +592,23 @@
      (central-directory-internal-attributes central)
      (central-directory-external-attributes central)
      (central-directory-uncompressed-size central)
-     (lambda (o)
-       (let ((m (central-directory-compression-method central)))
-         (cond ((= m compression-stored)
-                (extract-stored-data port o (central-directory-uncompressed-size
-                                             central)))
-               ((= m compression-deflated)
-                (extract-deflated-data port o (central-directory-uncompressed-size
-                                               central)))
-               (else
-                (raise (condition
-                        (make-who-condition 'get-file-record)
-                        (make-unsupported-error)
-                        (make-message-condition "unimplemented compression method")
-                        (make-irritants-condition m)))))))))
+      (lambda (o)
+        (extract-to-port port local central o))))
+
+  (define (extract-to-port zip-port local central dest-port)
+    (set-port-position! zip-port (file-record-data-port-position local))
+    (let ((m (central-directory-compression-method central))
+          (uncompressed-size (central-directory-uncompressed-size central)))
+      (cond ((= m compression-stored)
+             (extract-stored-data zip-port dest-port uncompressed-size))
+            ((= m compression-deflated)
+             (extract-deflated-data zip-port dest-port uncompressed-size))
+            (else
+             (raise (condition
+                     (make-who-condition 'get-file-record)
+                     (make-unsupported-error)
+                     (make-message-condition "unimplemented compression method")
+                     (make-irritants-condition m)))))))
 
   ;; This puts in a complete file record, including the file and
   ;; returns a central-directory record. The port is positioned to
-- 
tg: (43909de..) t/zip-extract-to-port (depends on: master)
- Get rid of the requirement for input/output ports and
  `set-port-position!' in the ZIP deflate code, which allows the ZIP
  library to work on Ikarus (which does not provide combined I/O
  ports). This also makes the code ~15% faster (measured on Ypsilon),
  and, IMHO, more useful in general (e.g. you can could now deflate to a
  pipe or socket, which don't allow seeks):

From: Andreas Rottmann <[email protected]>
Subject: [PATCH] Don't require an input/output port for extraction

---
 weinholt/compression/sliding-buffer.sls |  125 +++++++++++++++++
 weinholt/compression/zip.sls            |  225 ++++++++++++++-----------------
 2 files changed, 229 insertions(+), 121 deletions(-)

diff --git a/weinholt/compression/sliding-buffer.sls b/weinholt/compression/sliding-buffer.sls
new file mode 100644
index 0000000..77270a4
--- /dev/null
+++ b/weinholt/compression/sliding-buffer.sls
@@ -0,0 +1,125 @@
+;;; sliding-buffer.sls --- A circular buffer attached to a data sink
+
+;; Copyright (C) 2009 Andreas Rottmann <[email protected]>
+
+;; Author: Andreas Rottmann <[email protected]>
+
+;; This program is free software; you can redistribute it and/or
+;; modify it under the terms of the GNU General Public License
+;; as published by the Free Software Foundation; either version 3
+;; of the License, or (at your option) any later version.
+
+;; This program is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+;;; Commentary:
+
+;;; Code:
+#!r6rs
+
+(library (weinholt compression sliding-buffer)
+  (export make-sliding-buffer
+          sliding-buffer?
+          sliding-buffer-drain!
+          sliding-buffer-read!
+          sliding-buffer-put-u8!
+          sliding-buffer-dup!)
+  (import (rnrs))
+
+  (define-record-type sliding-buffer
+    (protocol (lambda (p)
+                (lambda (sink size)
+                  (p sink (make-bytevector size) 0 0))))
+    (fields sink
+            data
+            (mutable fill)
+            (mutable pos)))
+
+  (define (sliding-buffer-size buffer)
+    (bytevector-length (sliding-buffer-data buffer)))
+  
+  (define (%sliding-buffer-drain buffer pos fill)
+    (let ((sink (sliding-buffer-sink buffer))
+          (size (sliding-buffer-size buffer))
+          (data (sliding-buffer-data buffer)))
+      (let loop ((i (fxmod (fx- pos fill) size))
+                 (fill fill))
+        (when (fx>? fill 0)
+          (let ((count (fxmin fill (fx- size i))))
+            (sink data i count)
+            (loop (fxmod (fx+ i count) size)
+                  (fx- fill count)))))))
+  
+  (define (sliding-buffer-drain! buffer)
+    (%sliding-buffer-drain buffer
+                           (sliding-buffer-pos buffer)
+                           (sliding-buffer-fill buffer))
+    (sliding-buffer-fill-set! buffer 0))
+  
+  (define (sliding-buffer-read! buffer in-port len)
+    (let ((size (sliding-buffer-size buffer))
+          (data (sliding-buffer-data buffer)))
+      (let loop ((pos (sliding-buffer-pos buffer))
+                 (fill (sliding-buffer-fill buffer))
+                 (n-left len))
+        (cond ((fx=? 0 n-left)
+               (sliding-buffer-pos-set! buffer pos)
+               (sliding-buffer-fill-set! buffer fill)
+               len)
+              ((fx=? fill size)
+               (%sliding-buffer-drain buffer pos fill)
+               (loop pos 0 n-left))
+              (else
+               (let ((count (fxmin (fx- size fill) (fx- size pos) n-left)))
+                 (let ((n-read (get-bytevector-n! in-port data pos count)))
+                   (cond ((eof-object? n-read)
+                          (sliding-buffer-pos-set! buffer pos)
+                          (sliding-buffer-fill-set! buffer fill)
+                          (if (fx=? n-left len)
+                              (eof-object)
+                              (- len n-left)))
+                         (else
+                          (loop (fxmod (fx+ pos n-read) size)
+                                (fx+ fill n-read)
+                                (fx- n-left n-read)))))))))))
+
+  (define (sliding-buffer-put-u8! buffer u8)
+    (let ((size (sliding-buffer-size buffer)))
+      (when (fx=? (sliding-buffer-fill buffer) size)
+        (sliding-buffer-drain! buffer))
+      (let ((pos (sliding-buffer-pos buffer))
+            (data (sliding-buffer-data buffer)))
+        (bytevector-u8-set! (sliding-buffer-data buffer) pos u8)
+        (sliding-buffer-pos-set! buffer (fxmod (fx+ pos 1) size))
+        (sliding-buffer-fill-set! buffer (fx+ (sliding-buffer-fill buffer) 1)))))
+
+  (define (sliding-buffer-dup! buffer distance len)
+    (let ((size (sliding-buffer-size buffer))
+          (data (sliding-buffer-data buffer)))
+      (assert (< 0 distance (fx+ size 1)))
+      (cond ((< distance len)
+             (sliding-buffer-dup! buffer distance distance)
+             (sliding-buffer-dup! buffer distance (fx- len distance)))
+            (else
+             (let loop ((i (mod (fx- (sliding-buffer-pos buffer) distance) size))
+                        (pos (sliding-buffer-pos buffer))
+                        (fill (sliding-buffer-fill buffer))
+                        (n-left len))
+               (cond ((fx=? 0 n-left)
+                      (sliding-buffer-pos-set! buffer pos)
+                      (sliding-buffer-fill-set! buffer fill))
+                     ((fx=? fill size)
+                      (%sliding-buffer-drain buffer pos fill)
+                      (loop i pos 0 n-left))
+                     (else
+                      (let ((count (fxmin (fx- size i) (fx- size fill) n-left)))
+                        (bytevector-copy! data i data pos count)
+                        (loop (fxmod (fx+ i count) size)
+                              (fxmod (fx+ pos count) size)
+                              (fx+ fill count)
+                              (fx- n-left count)))))))))))
diff --git a/weinholt/compression/zip.sls b/weinholt/compression/zip.sls
index 74ad12d..de1c1a5 100644
--- a/weinholt/compression/zip.sls
+++ b/weinholt/compression/zip.sls
@@ -91,12 +91,20 @@
           (srfi :19 time)
           (weinholt struct pack (1 (>= 3)))
           (weinholt crypto crc (1 (>= 0)))
+          (weinholt compression sliding-buffer)
           (weinholt compression zip extra (0 (>= 0)))
           (weinholt compression huffman (0 (>= 0))))
 
   (define-crc crc-32)
 
-  (define (print . x) (for-each display x) (newline))
+  (define-syntax trace
+    (syntax-rules ()
+      #;
+      ((_ . args)
+       (begin
+         (for-each display (list . args))
+         (newline)))
+      ((_ . args) (begin 'dummy))))
 
   (define vector-copy
     (case-lambda
@@ -467,126 +475,101 @@
          2049 3073 4097 6145 8193 12289 16385 24577))
 
   (define (extract-deflated-data in out n)
-    (define (read-compressed-data table2 table3)
-      (let ((code (get-next-code get-bits table2)))
-        (cond ((< code 256)             ;literal byte
-               ;; (print "LITERAL: '" (integer->char code) "'")
-               (put-u8 out code)
-               (read-compressed-data table2 table3))
-              ((<= 257 code 285)
-               ;;(print "\nlen code: " code)
-               (let* ((len (+ (get-bits (vector-ref len-extra (- code 257)))
-                              (vector-ref len-base (- code 257))))
-                      (distcode (get-next-code get-bits table3))
-                      (dist (+ (get-bits (vector-ref dist-extra distcode))
-                               (vector-ref dist-base distcode))))
-                 ;; (print "len: " len "  dist: " dist "  @ position: " (port-position out))
-                 (let ((p (port-position out)))
-                   ;; (print "COPYING FROM POSITION: " (- p dist)  " THIS MUCH: " len)
-                   (cond ((< dist len)
-                          (let lp ((len len) (p p))
-                            ;; This is really stupid. Took me two
-                            ;; hours to figure out what was wrong and
-                            ;; put in this ugly fix.
-                            (unless (zero? len)
-                              (set-port-position! out (- p dist))
-                              (let ((b (get-u8 out)))
-                                (set-port-position! out p)
-                                ;; (print "LITERAL: '" (integer->char b) "'")
-                                (put-u8 out b)
-                                (lp (- len 1) (+ p 1))))))
-                         (else
-                          (set-port-position! out (- p dist))
-                          (let ((data (get-bytevector-n out len)))
-                            (set-port-position! out p)
-                            ;; (print "LITERAL: '" (utf8->string data) "'")
-                            (put-bytevector out data)))))
-                 (read-compressed-data table2 table3)))
-              ((= 256 code))            ;end of block
-              (else
-               (error 'inflate "error in compressed data (bad literal/length)")))))
-    (define (sad-crc-32-after-the-fact)
-      ;; It'd be better to do this during the unzipping, or in the
-      ;; sliding window code
-      (unless (= (port-position out) n)
-        (error 'extract-deflated-data "the file is not the right size..."))
-      (set-port-position! out 0)
-      (let* ((bufsize (min n (* 1024 1024)))
-             (buf (make-bytevector bufsize)))
-        (let lp ((crc (crc-32-init))
-                 (n n))
-          (if (zero? n)
-              (crc-32-finish crc)
-              (let ((read (get-bytevector-n! out buf 0 (min n bufsize))))
-                (lp (crc-32-update crc buf 0 read)
-                    (- n read)))))))
-    (define get-bits (make-bit-reader in))
-    (unless (and (port-has-port-position? out)
-                 (port-has-set-port-position!? out)
-                 (input-port? out) (output-port? out))
-      (error 'extract-deflated-data
-             "the output port should be an input/output and it needs port-position" out))
-    (let more-blocks ()
-      (let ((last? (= (get-bits 1) 1)))
-        (case (get-bits 2)              ;block-type
-          ((#b00)                       ;non-compressed block
-           (get-bits)                   ;seek to a byte boundary
-           (let ((len (get-bits 16))
-                 (nlen (get-bits 16)))
-             (unless (= len (fxand #xffff (fxnot nlen)))
-               (error 'inflate "error in non-compressed block length" len nlen))
-             (put-bytevector out (get-bytevector-n in len))))
-          ((#b01)                       ;static Huffman tree
-           (read-compressed-data static-table2 static-table3))
-          ((#b10)                       ;dynamic Huffman tree
-           (let* ((hlit (+ 257 (get-bits 5)))
-                  (hdist (+ 1 (get-bits 5)))
-                  (hclen (+ 4 (get-bits 4))))
-             (when (or (> hlit 286) (> hclen 19))
-               (error 'inflate "bad number of literal/length codes" hlit hclen))
-             ;; Up to 19 code lengths are now read...
-             (let ((table1
-                    (do ((order '#(16 17 18 0 8 7 9 6 10 5 11 4 12 3 13 2 14 1 15))
-                         (i 0 (+ i 1))
-                         (codes (make-vector 19 0)))
-                        ((= i hclen)
-                         ;; The 19 codes represent a canonical
-                         ;; Huffman table.
-                         (vector->huffman-lookup-table codes))
-                      (vector-set! codes (vector-ref order i)
-                                   (get-bits 3)))))
-               ;; Table 1 is now used to encode the `code-lengths'
-               ;; canonical Huffman table.
-               (let ((code-lengths (make-vector (+ hlit hdist) 0)))
-                 (let lp ((n 0))
-                   (unless (= n (+ hlit hdist))
-                     (let ((blc (get-next-code get-bits table1)))
-                       (cond
-                         ((< blc 16)    ;literal code
-                          (vector-set! code-lengths n blc)
-                          (lp (+ n 1)))
-                         ((= blc 16)    ;copy previous code
-                          (let ((rep (+ 3 (get-bits 2))))
-                            (do ((i 0 (+ i 1)))
-                                ((= i rep)
-                                 (lp (+ n rep)))
-                              (vector-set! code-lengths (+ n i)
-                                           (vector-ref code-lengths (- n 1))))))
-                         ((= blc 17)    ;fill with zeros
-                          (lp (+ n (+ 3 (get-bits 3)))))
-                         (else          ;fill with zeros (= blc 18)
-                          (lp (+ n (+ 11 (get-bits 7)))))))))
-                 ;; Table 2 is for lengths, literals and the
-                 ;; end-of-block. Table 3 is for distance codes.
-                 (read-compressed-data (vector->huffman-lookup-table
-                                        (vector-copy code-lengths 0 hlit #f))
-                                       (vector->huffman-lookup-table
-                                        (vector-copy code-lengths hlit)))))))
-          ((#b11)
-           (error 'inflate "error in compressed data (bad block type)")))
-        (if last?
-            (sad-crc-32-after-the-fact)
-            (more-blocks)))))
+    (let* ((crc (crc-32-init))
+           (output-len 0)
+           (buffer (make-sliding-buffer
+                    (lambda (bytevector start count)
+                      (put-bytevector out bytevector start count)
+                      (set! crc (crc-32-update crc bytevector start count))
+                      (set! output-len (+ output-len count)))
+                    (* 32 1024))))
+      (define (read-compressed-data table2 table3)
+        (let ((code (get-next-code get-bits table2)))
+          (cond ((< code 256)           ;literal byte
+                 (trace "LITERAL:" code)
+                 (sliding-buffer-put-u8! buffer code)
+                 (read-compressed-data table2 table3))
+                ((<= 257 code 285)
+                 (trace "\nlen code: " code)
+                 (let* ((len (+ (get-bits (vector-ref len-extra (- code 257)))
+                                (vector-ref len-base (- code 257))))
+                        (distcode (get-next-code get-bits table3))
+                        (dist (+ (get-bits (vector-ref dist-extra distcode))
+                                 (vector-ref dist-base distcode))))
+                   (trace "len: " len "  dist: " dist "  @ position: " (port-position out))
+                   (trace "COPYING FROM POSITION: " dist  " THIS MUCH: " len)
+                   (sliding-buffer-dup! buffer dist len)
+                   (read-compressed-data table2 table3)))
+                ((= 256 code))          ;end of block
+                (else
+                 (error 'inflate "error in compressed data (bad literal/length)")))))
+      (define get-bits (make-bit-reader in))
+      (let more-blocks ()
+        (let ((last? (= (get-bits 1) 1)))
+          (case (get-bits 2)            ;block-type
+            ((#b00)                     ;non-compressed block
+             (get-bits)                 ;seek to a byte boundary
+             (let ((len (get-bits 16))
+                   (nlen (get-bits 16)))
+               (unless (= len (fxand #xffff (fxnot nlen)))
+                 (error 'inflate "error in non-compressed block length" len nlen))
+               (unless (eqv? len (sliding-buffer-read! buffer in len))
+                 (error 'inflate "premature EOF encountered" ))))
+            ((#b01)                     ;static Huffman tree
+             (read-compressed-data static-table2 static-table3))
+            ((#b10)                     ;dynamic Huffman tree
+             (let* ((hlit (+ 257 (get-bits 5)))
+                    (hdist (+ 1 (get-bits 5)))
+                    (hclen (+ 4 (get-bits 4))))
+               (when (or (> hlit 286) (> hclen 19))
+                 (error 'inflate "bad number of literal/length codes" hlit hclen))
+               ;; Up to 19 code lengths are now read...
+               (let ((table1
+                      (do ((order '#(16 17 18 0 8 7 9 6 10 5 11 4 12 3 13 2 14 1 15))
+                           (i 0 (+ i 1))
+                           (codes (make-vector 19 0)))
+                          ((= i hclen)
+                           ;; The 19 codes represent a canonical
+                           ;; Huffman table.
+                           (vector->huffman-lookup-table codes))
+                        (vector-set! codes (vector-ref order i)
+                                     (get-bits 3)))))
+                 ;; Table 1 is now used to encode the `code-lengths'
+                 ;; canonical Huffman table.
+                 (let ((code-lengths (make-vector (+ hlit hdist) 0)))
+                   (let lp ((n 0))
+                     (unless (= n (+ hlit hdist))
+                       (let ((blc (get-next-code get-bits table1)))
+                         (cond
+                           ((< blc 16)  ;literal code
+                            (vector-set! code-lengths n blc)
+                            (lp (+ n 1)))
+                           ((= blc 16)  ;copy previous code
+                            (let ((rep (+ 3 (get-bits 2))))
+                              (do ((i 0 (+ i 1)))
+                                  ((= i rep)
+                                   (lp (+ n rep)))
+                                (vector-set! code-lengths (+ n i)
+                                             (vector-ref code-lengths (- n 1))))))
+                           ((= blc 17)  ;fill with zeros
+                            (lp (+ n (+ 3 (get-bits 3)))))
+                           (else        ;fill with zeros (= blc 18)
+                            (lp (+ n (+ 11 (get-bits 7)))))))))
+                   ;; Table 2 is for lengths, literals and the
+                   ;; end-of-block. Table 3 is for distance codes.
+                   (read-compressed-data (vector->huffman-lookup-table
+                                          (vector-copy code-lengths 0 hlit #f))
+                                         (vector->huffman-lookup-table
+                                          (vector-copy code-lengths hlit)))))))
+            ((#b11)
+             (error 'inflate "error in compressed data (bad block type)")))
+          (cond (last?
+                 (sliding-buffer-drain! buffer)
+                 (unless (= output-len n)
+                   (error 'extract-deflated-data "the file is not the right size..."))
+                 (crc-32-finish crc))
+                (else
+                 (more-blocks)))))))
 
 ;;;
 
-- 
tg: (a12a8e9..) t/zip-no-i+o-port (depends on: master t/fix-copy-vector)
Furthermore, I've added a fairly faithful port of aptitude's dependency
resolver algorithm to the Dorodango codebase. This was a decent amount
of effort -- ~9 KLOC of C++ resulting in ~3 KLOC of Scheme code (of
which, in both cases, a significant amount is devoted to debug
tracing). The algorithm should be semantically equivalent, except for
two features I've left out:

- Soft dependencies (these are used in Debian for the "Recommends"
  field, which tells the package manager that it /should/ satisfy the
  recommended dependencies, but it's not strictly required to do so). I
  think we can do without those.

- Aptitude has a fancy mechanism designed for user interaction with the
  resolver; for instance, a user might approve the deinstallation of a
  package, which is taken into account by the resolver when generating
  (future) solutions. I intend to eventually add that functionality to
  Dorodango as well, but it's pretty low priority ATM.

Note that the dependency resolver is not yet used by Dorodango, so that
leaves the following on my TODO list before I'll think of an initial
release:

- HTTP support
- Dependency handling, using the already-present solver library


[0] http://gitorious.org/dorodango/mainline
[1] https://code.launchpad.net/~weinholt/scheme-libraries/industria
[2] https://bugs.launchpad.net/scheme-libraries/+bug/429372

Cheers, Rotty
-- 
Andreas Rottmann -- <http://rotty.yi.org/>

Reply via email to