I'm a few months into learning Clojure, and thought I'd put this function out for comment.
I need to take a message digest of files on disk. I'm using a class in java.security to do this. The class uses an update method which accepts an array of bytes, and updates the hash. This calls for the common read-update pattern, but in Clojure. So I decided to try my hand at a lazy sequence of byte arrays: (defn stream-block-seq "A lazy sequence of blocks read from the given input-stream. Each block is returned as a separately allocated Java byte array. The maximum block size is given as the optional second argument; the default is 1024. A returned block may be shorter than the blocksize. Usually, the last block will be short. If the stream is exhausted, the result is nil." ([s blocksize] (let [buf (byte-array blocksize) readlen (.read s buf)] (if (>= readlen 0) (lazy-seq (let [newbuf (if (< readlen blocksize) (copy-array buf (byte-array readlen) readlen) buf)] (cons newbuf (stream-block-seq s blocksize))))))) ([s] (stream-block-seq s 1024))) Here's copy-array: (defn copy-array ([src srcpos dest destpos len] (do (System/arraycopy src srcpos dest destpos len) dest)) ([src dest len] (copy-array src 0 dest 0 len))) And here's the message-digest function that uses it: (defn message-digest "Generates a digest of the given input plaintext. Input must be a Java byte array, a Java ByteBuffer. hashname is optional and defaults to \"SHA-256\". The result is a vector of bytes. See http://download.oracle.com/javase/1.5.0/docs/guide/security/CryptoSpec.html#AppA for more information on the available hashes." ([input & opts] (let [opts (merge { :hash "SHA-256" :blocksize 32768 } (apply hash-map opts)) hashname (opts :hash) blocksize (opts :blocksize) md (MessageDigest/getInstance hashname)] (doseq [buf (stream-block-seq (input-stream input) blocksize)] (.update md buf)) (vec (.digest md))))) This all seems to work, and the performance seems acceptable: with a 32k buffer size, on my Core 2 Duo Macbook it takes about 50ms to hash a 1MiB file from disk, and 20ms from filesystem cache. However, I'm sure there's plenty of room for improvement. Is there a cleaner or more efficient way to do this? I found two previous threads which deal with similar puzzles -- * Resource cleanup when lazy sequences are finalized: http://groups.google.com/group/clojure/browse_thread/thread/caece062119de072/13c15c62c3397597?lnk=gst&q=lazy+buffered#13c15c62c3397597 * contrib mmap/duck_streams for binary data: http://groups.google.com/group/clojure/browse_thread/thread/f5239c7e66e7fb54/813b70b68081456d?lnk=gst&q=lazy+binary+stream#813b70b68081456d I must say that I find the lazy-sequence approach conceptually quite attractive here, but my taste may not yet be properly formed :) Comments welcome! thanks Michael Ashton. -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en