[PATCH] enhancement: modify md5sum to allow piping

Daniel Santos Thu, 20 Dec 2012 14:37:16 -0800

There are many times, usually when doing system backups, maintenance,recovery, etc., that I would like to pipe large files through md5sum toproduce or verify a hash so that I do not have to read the file multipletimes. This is especially the case when backing up a system from alivecd across the network


dd if=/dev/sda3 | pbzip2 -c2 | netcat 192.168.1.123 45678
or
tar c /mnt/sda3 | pbzip2 -c2 | netcat 192.168.1.123 45678

Attached is a preliminary patch set that will allow for this as in thefollowing example

dd if=/dev/sda3 | pbzip2 -c2 | md5sum -po /tmp/sda3.dat.bzip2.md5 |netcat 192.168.1.123 45678

-p is short for --pipe and -o <filename> is short for --outfile<filename>. Then, on the receiving end, the hash can be determined asthe file is read, eliminating any worry about network corruption:


netcat -l -p 45678| md5sum -po sda3.dat.bzip2.rx.md5 > sda3.dat.bzip2

The only caveat being that you have to manually compare the sum files,which you can just do by calling diff, a small cost when compared tore-reading a 200GiB file!

You can even get the sum prior to compression, although if you wanted toavoid a duplicate read on the server end, you would have to decompressas you read it and either store the file uncompressed or re-compress it.

dd if=/dev/sda3 | md5sum -po /tmp/sda3.dat.md5 | pbzip2 -c2 | netcat192.168.1.123 45678

with
netcat -l -p 45678| pbzip2 -cd | md5sum -po sda3.dat.rx.md5 > sda3.dat

The attached patchset is in a very early stage and has many problems:

 * GNU coding style compliance (this coding style is new to me)
 * API in gnulib is changed, may break other apps
 * all changes are lumped together and needs to be broken apart into
   logical changes
 * it has a few hacks that need to be cleaned up

Also, this patch set addresses a problem with the gnulib's hashfunctions where there was a lot of copy & paste code. I've implementeda mechanism to clean this up w/o a performance hit (as long as we'reusing gcc 4.6.1+). This change should probably go into a separatepatchset & bug report.

Finally, after the cursory amount that I've worked with this code, I seea number of other areas where I believe there's room for improvement.


 * The copy & paste code problem (mentioned above)
 * Centralize the location where BLOCKSIZE is defined and only verify
   it's a multiple of 64 in gnulib/lib/{md,sha}*.c
 * Perhaps allow BLOCKSIZE to be defined at configure time? Honestly,
   I'm not intimately familiar enough with the issues where I can be
   certain it would alter performance on any system, but I'm thinking
   about embedded where reading 32k chunks may end up thrashing the
   cache, but 8k or 4k would not. However, I don't think I would be in
   favor of this being a run-time parameter, as it would seem to be a
   lot of waste (and lost optimizations) for something that's probably
   pretty specific to the hardware and build target.
 * Centralize compiler sniffing into a single gnulib header, (like
   "compiler.h" or some such) and define the GCC_VERSION macro as
   described in
   http://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html.
 * Make better use of __builtin_expect via portable likely/unlikely
   macros to make sure error handling code gets moved out of the main

bodies of functions (which can save a cache miss here and there).Of course, this would require the above item to do cleanly.

 * Introduce some tuning parameter in the configure script to choose
   between smaller and larger, but more optimized code.  I bring this
   up mainly because in my re-work of the copy & pasted code, I see a
   large opportunity to create a much smaller executable (if needed),
   but one that would create slightly slower code, which would usually
   be undesirable on a machine with plenty of RAM, storage and CPU cache.

Obviously, these should be made into separate bug reports as well and Ican send separate emails for them if you like.


Daniel

>From 8c3d2a3019f21efce441d58e8084959cfe7bd043 Mon Sep 17 00:00:00 2001
From: Daniel Santos <[email protected]>
Date: Thu, 20 Dec 2012 03:51:39 -0600
Subject: md5sum: pipe

---
 src/md5sum.c |  133 ++++++++++++++++++++++++++++++++++++++++++++++++---------
 1 files changed, 112 insertions(+), 21 deletions(-)

diff --git a/src/md5sum.c b/src/md5sum.c
index 1663c1e..1953e67 100644
--- a/src/md5sum.c
+++ b/src/md5sum.c
@@ -39,6 +39,7 @@
 #include "fadvise.h"
 #include "stdio--.h"
 #include "xfreopen.h"
+#include "close-stream.h"
 
 /* The official name of this program (e.g., no 'g' prefix).  */
 #if HASH_ALGO_MD5
@@ -129,6 +130,14 @@ static bool strict = false;
 /* Whether a BSD reversed format checksum is detected.  */
 static int bsd_reversed = -1;
 
+static bool pipe_to_stdout = false;
+static const char *out_filename = NULL;
+FILE *outfile = NULL;
+
+#ifndef unlikely
+#define unlikely(exp) (exp)
+#endif
+
 /* For long options that have no equivalent short option, use a
    non-character as a pseudo short option, starting with CHAR_MAX + 1.  */
 enum
@@ -149,6 +158,9 @@ static struct option const long_options[] =
   { "warn", no_argument, NULL, 'w' },
   { "strict", no_argument, NULL, STRICT_OPTION },
   { "tag", no_argument, NULL, TAG_OPTION },
+  { "outfile", required_argument, NULL, 'o' },
+  { "append", no_argument, NULL, 'a' },
+  { "pipe", no_argument, NULL, 'p' },
   { GETOPT_HELP_OPTION_DECL },
   { GETOPT_VERSION_OPTION_DECL },
   { NULL, 0, NULL, 0 }
@@ -203,6 +215,16 @@ The following three options are useful only when verifying checksums:\n\
       fputs (_("\
       --strict         with --check, exit non-zero for any invalid input\n\
 "), stdout);
+        fputs (_("\
+  -o, --outfile file   output results to the specified file instead of\n\
+                       standard out.\n\
+"), stdout);
+        fputs (_("\
+  -a, --append         append to output file (used with --output)\n\
+"), stdout);
+        fputs (_("\
+  -p, --pipe           pipe through md5sum (must be used with --outfile)\n\
+"), stdout);
       fputs (HELP_OPTION_DESCRIPTION, stdout);
       fputs (VERSION_OPTION_DESCRIPTION, stdout);
       printf (_("\
@@ -457,10 +479,15 @@ digest_file (const char *filename, int *binary, unsigned char *bin_result)
 
   fadvise (fp, FADVISE_SEQUENTIAL);
 
-  err = DIGEST_STREAM (fp, bin_result);
-  if (err)
+  err = DIGEST_STREAM (fp, pipe_to_stdout ? stdout : NULL, bin_result);
+
+  if (unlikely(err))
     {
-      error (0, errno, "%s", filename);
+      if (err == 1 /* DIGEST_STREAM_READ_ERROR */)
+            error (0, errno, "%s: %s", _("read error"), filename);
+      else
+            error (0, errno, "%s: stdout", _("write error"));
+
       if (fp != stdin)
         fclose (fp);
       return false;
@@ -667,21 +694,36 @@ print_filename (char const *file)
       switch (*file)
         {
         case '\n':
-          fputs ("\\n", stdout);
+          fputs ("\\n", outfile);
           break;
 
         case '\\':
-          fputs ("\\\\", stdout);
+          fputs ("\\\\", outfile);
           break;
 
         default:
-          putchar (*file);
+          putc (*file, outfile);
           break;
         }
       file++;
     }
 }
 
+static void
+md5sum_cleanup (void)
+{
+  if (pipe_to_stdout && fflush (stdout))
+    error (0, errno, _("error flushing stdout"));
+
+  if (outfile && outfile != stdout && close_stream (outfile))
+    error (0, errno, _("error closing: %s"), out_filename);
+
+  if (out_filename)
+    free(bad_cast (out_filename));
+
+  close_stdout();
+}
+
 int
 main (int argc, char **argv)
 {
@@ -693,6 +735,7 @@ main (int argc, char **argv)
   bool ok = true;
   int binary = -1;
   bool prefix_tag = false;
+  const char *outfile_mode = "w";
 
   /* Setting values of global variables.  */
   initialize_main (&argc, &argv);
@@ -701,13 +744,9 @@ main (int argc, char **argv)
   bindtextdomain (PACKAGE, LOCALEDIR);
   textdomain (PACKAGE);
 
-  atexit (close_stdout);
+  atexit (md5sum_cleanup);
 
-  /* Line buffer stdout to ensure lines are written atomically and immediately
-     so that processes running in parallel do not intersperse their output.  */
-  setvbuf (stdout, NULL, _IOLBF, 0);
-
-  while ((opt = getopt_long (argc, argv, "bctw", long_options, NULL)) != -1)
+  while ((opt = getopt_long (argc, argv, "bctwo:pa", long_options, NULL)) != -1)
     switch (opt)
       {
       case 'b':
@@ -741,6 +780,15 @@ main (int argc, char **argv)
         prefix_tag = true;
         binary = 1;
         break;
+      case 'p':
+        pipe_to_stdout = true;
+        break;
+      case 'o':
+        out_filename = strdup(optarg);
+        break;
+      case 'a':
+        outfile_mode = "a";
+        break;
       case_GETOPT_HELP_CHAR;
       case_GETOPT_VERSION_CHAR (PROGRAM_NAME, AUTHORS);
       default:
@@ -803,6 +851,49 @@ main (int argc, char **argv)
      usage (EXIT_FAILURE);
    }
 
+  if (pipe_to_stdout && !out_filename)
+   {
+     error (0, 0, _("--pipe must be accompanied by --outfile"));
+     usage (EXIT_FAILURE);
+   }
+
+  /* I'm not sure how strict to be here. */
+  if (pipe_to_stdout && isatty(STDOUT_FILENO) && binary)
+   {
+     error (0, 0, _("refusing to write binary data to a terminal"));
+     usage (EXIT_FAILURE);
+   }
+
+  if (pipe_to_stdout && optind >= (argc + 1))
+   {
+     error (0, 0, _("--pipe cannot be used on multiple files."));
+     usage (EXIT_FAILURE);
+   }
+
+  if (!outfile && *outfile_mode == 'a')
+    {
+      error (0, 0,
+       _("the --append option is meaningful only when accompanied by "
+         "--outfile"));
+      usage (EXIT_FAILURE);
+    }
+
+  if (!out_filename)
+    outfile = stdout;
+  else
+    {
+      if (!(outfile = fopen(out_filename, outfile_mode)))
+        {
+          error (0, 0, _("Failed to open file for writing: %s"),
+                 out_filename);
+          exit (EXIT_FAILURE);
+        }
+    }
+
+  /* Line buffer stdout to ensure lines are written atomically and immediately
+     so that processes running in parallel do not intersperse their output.  */
+  setvbuf (outfile, NULL, _IOLBF, 0);
+
   if (!O_BINARY && binary < 0)
     binary = 0;
 
@@ -826,12 +917,12 @@ main (int argc, char **argv)
               if (prefix_tag)
                 {
                   if (strchr (file, '\n') || strchr (file, '\\'))
-                    putchar ('\\');
+                    putc ('\\', outfile);
 
-                  fputs (DIGEST_TYPE_STRING, stdout);
-                  fputs (" (", stdout);
+                  fputs (DIGEST_TYPE_STRING, outfile);
+                  fputs (" (", outfile);
                   print_filename (file);
-                  fputs (") = ", stdout);
+                  fputs (") = ", outfile);
                 }
 
               size_t i;
@@ -839,21 +930,21 @@ main (int argc, char **argv)
               /* Output a leading backslash if the file name contains
                  a newline or backslash.  */
               if (!prefix_tag && (strchr (file, '\n') || strchr (file, '\\')))
-                putchar ('\\');
+                putc ('\\', outfile);
 
               for (i = 0; i < (digest_hex_bytes / 2); ++i)
-                printf ("%02x", bin_buffer[i]);
+                fprintf (outfile, "%02x", bin_buffer[i]);
 
               if (!prefix_tag)
                 {
-                  putchar (' ');
+                  putc (' ', outfile);
 
-                  putchar (file_is_binary ? '*' : ' ');
+                  putc (file_is_binary ? '*' : ' ', outfile);
 
                   print_filename (file);
                 }
 
-              putchar ('\n');
+              putc ('\n', outfile);
             }
         }
     }
-- 
1.7.8.6

>From 03009bdf1f8ca5e45f87505173a50f3f85cf6caf Mon Sep 17 00:00:00 2001
From: Daniel Santos <[email protected]>
Date: Thu, 20 Dec 2012 02:47:58 -0600
Subject: piping support

---
 lib/digest.h |  164 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 lib/md2.c    |   77 ++++------------------------
 lib/md2.h    |    2 +-
 lib/md5.c    |   77 ++++------------------------
 lib/md5.h    |    2 +-
 lib/sha1.c   |   77 ++++------------------------
 lib/sha1.h   |    2 +-
 lib/sha256.c |  151 +++++++----------------------------------------------
 lib/sha256.h |    4 +-
 lib/sha512.c |  151 +++++++----------------------------------------------
 lib/sha512.h |    4 +-
 11 files changed, 239 insertions(+), 472 deletions(-)
 create mode 100644 lib/digest.h

diff --git a/lib/digest.h b/lib/digest.h
new file mode 100644
index 0000000..0983a07
--- /dev/null
+++ b/lib/digest.h
@@ -0,0 +1,164 @@
+/* Functions to compute MD5 message digest of files or memory blocks.
+   according to the definition of MD5 in RFC 1321 from April 1992.
+   Copyright (C) 1995-1997, 1999-2001, 2005-2006, 2008-2012 Free Software
+   Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   This program is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 2, or (at your option) any
+   later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, see <http://www.gnu.org/licenses/>.  */
+
+
+#include <stdlib.h>
+
+#include "full-write.h"
+
+#if BLOCKSIZE % 64 != 0
+# error "invalid BLOCKSIZE"
+#endif
+
+#ifdef __GNUC__
+# define likely(exp)    __builtin_expect((exp), 1)
+# define unlikely(exp)  __builtin_expect((exp), 0)
+
+# if __GNUC__ >= 4
+#  ifndef __always_inline
+#   define __always_inline inline __attribute__((always_inline))
+#  endif
+
+#  if __GNUC_MINOR__ >= 1 && !(__GNUC_MINOR__ == 6 && __GNUC_PATCHLEVEL__ == 0)
+#   define __flatten __attribute__((flatten))
+#  endif
+
+# endif
+#endif
+
+#ifndef likely
+# define likely(exp) (exp)
+#endif
+#ifndef unlikely
+# define unlikely(exp) (exp)
+#endif
+#ifndef __always_inline
+# define __always_inline inline
+#endif
+#ifndef __flatten
+# define __flatten
+#endif
+
+typedef void (*digest_init_ctx_fn) (void *ctx);
+typedef void (*digest_process_data_fn) (const void *buffer, size_t len, void *ctx);
+typedef void *(*digest_finish_ctx_fn) (void *ctx, void *resbuf);
+
+struct digest_funcs {
+  const digest_init_ctx_fn     init_ctx;
+  const digest_process_data_fn process_block;
+  const digest_process_data_fn process_bytes;
+  const digest_finish_ctx_fn   finish_ctx;
+};
+
+/* TODO: Add configure value to make smaller code and encapsulate usage of
+ * __flatten and __always_inline */
+
+enum digest_stream_result {
+  DIGEST_STREAM_SUCCESS,
+  DIGEST_STREAM_READ_ERROR,
+  DIGEST_STREAM_WRITE_ERROR
+};
+
+/* Compute message digest for bytes read from STREAM.  The
+   resulting message digest number will be written into the 16 bytes
+   beginning at RESBLOCK.  */
+static __always_inline int
+digest_stream (FILE *instream, FILE *outstream, void *resblock, void *ctx,
+               const struct digest_funcs *funcs)
+{
+  size_t sum;
+
+  /* zero is actually stdin, which we will never write to, but this can
+   * save a register */
+  int outfd = outstream ? fileno(outstream) : 0;
+
+  char *buffer = malloc (BLOCKSIZE + 72);
+  if (!buffer)
+    return 1;
+
+  /* Initialize the computation context.  */
+  funcs->init_ctx (ctx);
+
+  /* Iterate over full file contents.  */
+  while (1)
+    {
+      /* We read the file in blocks of BLOCKSIZE bytes.  One call of the
+         computation function processes the whole buffer so that with the
+         next round of the loop another block can be read.  */
+      size_t n;
+      sum = 0;
+
+      /* Read block.  Take care for partial reads.  */
+      while (1)
+        {
+          char *start = buffer + sum;
+          size_t size = BLOCKSIZE - sum;
+
+          n = fread (start, 1, size, instream);
+
+          /* keep writes going smoothly, even if we've chosen a blocksize
+           * that's too large to work well on the arch or I/O devices */
+          if (outfd && n && unlikely(full_write (outfd, start, size) != size))
+            {
+              free (buffer);
+              return DIGEST_STREAM_WRITE_ERROR;
+            }
+
+          sum += n;
+
+          if (sum == BLOCKSIZE)
+            break;
+
+          if (n == 0)
+            {
+              /* Check for the error flag IFF N == 0, so that we don't
+                 exit the loop after a partial read due to e.g., EAGAIN
+                 or EWOULDBLOCK.  */
+              if (unlikely(ferror (instream)))
+                {
+                  free (buffer);
+                  return DIGEST_STREAM_READ_ERROR;
+                }
+              goto process_partial_block;
+            }
+
+          /* We've read at least one byte, so ignore errors.  But always
+             check for EOF, since feof may be true even though N > 0.
+             Otherwise, we could end up calling fread after EOF.  */
+          if (feof (instream))
+            goto process_partial_block;
+        }
+
+      /* Process buffer with BLOCKSIZE bytes.  Note that
+         BLOCKSIZE % 64 == 0
+       */
+      funcs->process_block (buffer, BLOCKSIZE, ctx);
+    }
+
+process_partial_block:
+
+  /* Process any remaining bytes.  */
+  if (sum > 0)
+    funcs->process_bytes (buffer, sum, ctx);
+
+  /* Construct result in desired memory.  */
+  funcs->finish_ctx (ctx, resblock);
+  free (buffer);
+  return 0;
+}
diff --git a/lib/md2.c b/lib/md2.c
index 1d181f9..5f64175 100644
--- a/lib/md2.c
+++ b/lib/md2.c
@@ -34,9 +34,7 @@
 #endif
 
 #define BLOCKSIZE 32768
-#if BLOCKSIZE % 64 != 0
-# error "invalid BLOCKSIZE"
-#endif
+#include "digest.h"
 
 static void md2_update_chksum (struct md2_ctx *md);
 static void md2_compress (struct md2_ctx *md);
@@ -90,74 +88,19 @@ md2_finish_ctx (struct md2_ctx *ctx, void *resbuf)
 /* Compute MD2 message digest for bytes read from STREAM.  The
    resulting message digest number will be written into the 16 bytes
    beginning at RESBLOCK.  */
-int
-md2_stream (FILE *stream, void *resblock)
+int __flatten
+md2_stream (FILE *instream, FILE *outstream, void *resblock)
 {
   struct md2_ctx ctx;
-  size_t sum;
-
-  char *buffer = malloc (BLOCKSIZE + 72);
-  if (!buffer)
-    return 1;
-
-  /* Initialize the computation context.  */
-  md2_init_ctx (&ctx);
-
-  /* Iterate over full file contents.  */
-  while (1)
-    {
-      /* We read the file in blocks of BLOCKSIZE bytes.  One call of the
-         computation function processes the whole buffer so that with the
-         next round of the loop another block can be read.  */
-      size_t n;
-      sum = 0;
-
-      /* Read block.  Take care for partial reads.  */
-      while (1)
-        {
-          n = fread (buffer + sum, 1, BLOCKSIZE - sum, stream);
-
-          sum += n;
-
-          if (sum == BLOCKSIZE)
-            break;
-
-          if (n == 0)
-            {
-              /* Check for the error flag IFF N == 0, so that we don't
-                 exit the loop after a partial read due to e.g., EAGAIN
-                 or EWOULDBLOCK.  */
-              if (ferror (stream))
-                {
-                  free (buffer);
-                  return 1;
-                }
-              goto process_partial_block;
-            }
-
-          /* We've read at least one byte, so ignore errors.  But always
-             check for EOF, since feof may be true even though N > 0.
-             Otherwise, we could end up calling fread after EOF.  */
-          if (feof (stream))
-            goto process_partial_block;
-        }
-
-      /* Process buffer with BLOCKSIZE bytes.  Note that
-         BLOCKSIZE % 64 == 0
-       */
-      md2_process_block (buffer, BLOCKSIZE, &ctx);
-    }
-
-process_partial_block:;
 
-  /* Process any remaining bytes.  */
-  if (sum > 0)
-    md2_process_bytes (buffer, sum, &ctx);
+  struct digest_funcs funcs = {
+    .init_ctx      = (digest_init_ctx_fn)      md2_init_ctx;
+    .process_block = (digest_process_block_fn) md2_process_block;
+    .process_bytes = (digest_process_bytes_fn) md2_process_bytes;
+    .finish_ctx;   = (digest_finish_fn)        md2_finish_ctx;
+  };
 
-  /* Construct result in desired memory.  */
-  md2_finish_ctx (&ctx, resblock);
-  free (buffer);
-  return 0;
+  return digest_stream(instream, outstream, resblock, &funcs);
 }
 
 /* Compute MD5 message digest for LEN bytes beginning at BUFFER.  The
diff --git a/lib/md2.h b/lib/md2.h
index fd14155..6cb7e8b 100644
--- a/lib/md2.h
+++ b/lib/md2.h
@@ -69,7 +69,7 @@ extern void *md2_read_ctx (const struct md2_ctx *ctx, void *resbuf);
 /* Compute MD2 message digest for bytes read from STREAM.  The
    resulting message digest number will be written into the 16 bytes
    beginning at RESBLOCK.  */
-extern int md2_stream (FILE *stream, void *resblock);
+extern int md2_stream (FILE *instream, FILE *outstream, void *resblock);
 
 /* Compute MD2 message digest for LEN bytes beginning at BUFFER.  The
    result is always in little endian byte order, so that a byte-wise
diff --git a/lib/md5.c b/lib/md5.c
index 66ede23..ad817e2 100644
--- a/lib/md5.c
+++ b/lib/md5.c
@@ -57,9 +57,7 @@
 #endif
 
 #define BLOCKSIZE 32768
-#if BLOCKSIZE % 64 != 0
-# error "invalid BLOCKSIZE"
-#endif
+#include "digest.h"
 
 /* This array contains the bytes used to pad the buffer to the next
    64-byte boundary.  (RFC 1321, 3.1: Step 1)  */
@@ -132,74 +130,19 @@ md5_finish_ctx (struct md5_ctx *ctx, void *resbuf)
 /* Compute MD5 message digest for bytes read from STREAM.  The
    resulting message digest number will be written into the 16 bytes
    beginning at RESBLOCK.  */
-int
-md5_stream (FILE *stream, void *resblock)
+int __flatten
+md5_stream (FILE *instream, FILE *outstream, void *resblock)
 {
   struct md5_ctx ctx;
-  size_t sum;
-
-  char *buffer = malloc (BLOCKSIZE + 72);
-  if (!buffer)
-    return 1;
-
-  /* Initialize the computation context.  */
-  md5_init_ctx (&ctx);
-
-  /* Iterate over full file contents.  */
-  while (1)
-    {
-      /* We read the file in blocks of BLOCKSIZE bytes.  One call of the
-         computation function processes the whole buffer so that with the
-         next round of the loop another block can be read.  */
-      size_t n;
-      sum = 0;
-
-      /* Read block.  Take care for partial reads.  */
-      while (1)
-        {
-          n = fread (buffer + sum, 1, BLOCKSIZE - sum, stream);
-
-          sum += n;
-
-          if (sum == BLOCKSIZE)
-            break;
-
-          if (n == 0)
-            {
-              /* Check for the error flag IFF N == 0, so that we don't
-                 exit the loop after a partial read due to e.g., EAGAIN
-                 or EWOULDBLOCK.  */
-              if (ferror (stream))
-                {
-                  free (buffer);
-                  return 1;
-                }
-              goto process_partial_block;
-            }
-
-          /* We've read at least one byte, so ignore errors.  But always
-             check for EOF, since feof may be true even though N > 0.
-             Otherwise, we could end up calling fread after EOF.  */
-          if (feof (stream))
-            goto process_partial_block;
-        }
-
-      /* Process buffer with BLOCKSIZE bytes.  Note that
-         BLOCKSIZE % 64 == 0
-       */
-      md5_process_block (buffer, BLOCKSIZE, &ctx);
-    }
-
-process_partial_block:
 
-  /* Process any remaining bytes.  */
-  if (sum > 0)
-    md5_process_bytes (buffer, sum, &ctx);
+  struct digest_funcs funcs = {
+    .init_ctx      = (digest_init_ctx_fn)     md5_init_ctx,
+    .process_block = (digest_process_data_fn) md5_process_block,
+    .process_bytes = (digest_process_data_fn) md5_process_bytes,
+    .finish_ctx    = (digest_finish_ctx_fn)   md5_finish_ctx
+  };
 
-  /* Construct result in desired memory.  */
-  md5_finish_ctx (&ctx, resblock);
-  free (buffer);
-  return 0;
+  return digest_stream(instream, outstream, resblock, &ctx, &funcs);
 }
 
 /* Compute MD5 message digest for LEN bytes beginning at BUFFER.  The
diff --git a/lib/md5.h b/lib/md5.h
index f571a70..419f922 100644
--- a/lib/md5.h
+++ b/lib/md5.h
@@ -109,7 +109,7 @@ extern void *__md5_read_ctx (const struct md5_ctx *ctx, void *resbuf) __THROW;
 /* Compute MD5 message digest for bytes read from STREAM.  The
    resulting message digest number will be written into the 16 bytes
    beginning at RESBLOCK.  */
-extern int __md5_stream (FILE *stream, void *resblock) __THROW;
+extern int __md5_stream (FILE *instream, FILE *outstream, void *resblock) __THROW;
 
 /* Compute MD5 message digest for LEN bytes beginning at BUFFER.  The
    result is always in little endian byte order, so that a byte-wise
diff --git a/lib/sha1.c b/lib/sha1.c
index db4ab42..8dcc21d 100644
--- a/lib/sha1.c
+++ b/lib/sha1.c
@@ -42,9 +42,7 @@
 #endif
 
 #define BLOCKSIZE 32768
-#if BLOCKSIZE % 64 != 0
-# error "invalid BLOCKSIZE"
-#endif
+#include "digest.h"
 
 /* This array contains the bytes used to pad the buffer to the next
    64-byte boundary.  (RFC 1321, 3.1: Step 1)  */
@@ -120,74 +118,19 @@ sha1_finish_ctx (struct sha1_ctx *ctx, void *resbuf)
 /* Compute SHA1 message digest for bytes read from STREAM.  The
    resulting message digest number will be written into the 16 bytes
    beginning at RESBLOCK.  */
-int
-sha1_stream (FILE *stream, void *resblock)
+int __flatten
+sha1_stream (FILE *instream, FILE *outstream, void *resblock)
 {
   struct sha1_ctx ctx;
-  size_t sum;
-
-  char *buffer = malloc (BLOCKSIZE + 72);
-  if (!buffer)
-    return 1;
-
-  /* Initialize the computation context.  */
-  sha1_init_ctx (&ctx);
-
-  /* Iterate over full file contents.  */
-  while (1)
-    {
-      /* We read the file in blocks of BLOCKSIZE bytes.  One call of the
-         computation function processes the whole buffer so that with the
-         next round of the loop another block can be read.  */
-      size_t n;
-      sum = 0;
-
-      /* Read block.  Take care for partial reads.  */
-      while (1)
-        {
-          n = fread (buffer + sum, 1, BLOCKSIZE - sum, stream);
-
-          sum += n;
-
-          if (sum == BLOCKSIZE)
-            break;
-
-          if (n == 0)
-            {
-              /* Check for the error flag IFF N == 0, so that we don't
-                 exit the loop after a partial read due to e.g., EAGAIN
-                 or EWOULDBLOCK.  */
-              if (ferror (stream))
-                {
-                  free (buffer);
-                  return 1;
-                }
-              goto process_partial_block;
-            }
-
-          /* We've read at least one byte, so ignore errors.  But always
-             check for EOF, since feof may be true even though N > 0.
-             Otherwise, we could end up calling fread after EOF.  */
-          if (feof (stream))
-            goto process_partial_block;
-        }
-
-      /* Process buffer with BLOCKSIZE bytes.  Note that
-                        BLOCKSIZE % 64 == 0
-       */
-      sha1_process_block (buffer, BLOCKSIZE, &ctx);
-    }
-
- process_partial_block:;
 
-  /* Process any remaining bytes.  */
-  if (sum > 0)
-    sha1_process_bytes (buffer, sum, &ctx);
+  struct digest_funcs funcs = {
+    .init_ctx      = (digest_init_ctx_fn)     sha1_init_ctx,
+    .process_block = (digest_process_data_fn) sha1_process_block,
+    .process_bytes = (digest_process_data_fn) sha1_process_bytes,
+    .finish_ctx    = (digest_finish_ctx_fn)   sha1_finish_ctx
+  };
 
-  /* Construct result in desired memory.  */
-  sha1_finish_ctx (&ctx, resblock);
-  free (buffer);
-  return 0;
+  return digest_stream(instream, outstream, resblock, &ctx, &funcs);
 }
 
 /* Compute SHA1 message digest for LEN bytes beginning at BUFFER.  The
diff --git a/lib/sha1.h b/lib/sha1.h
index 4e55430..502cd3a 100644
--- a/lib/sha1.h
+++ b/lib/sha1.h
@@ -76,7 +76,7 @@ extern void *sha1_read_ctx (const struct sha1_ctx *ctx, void *resbuf);
 /* Compute SHA1 message digest for bytes read from STREAM.  The
    resulting message digest number will be written into the 20 bytes
    beginning at RESBLOCK.  */
-extern int sha1_stream (FILE *stream, void *resblock);
+extern int sha1_stream (FILE *instream, FILE *outstream, void *resblock);
 
 /* Compute SHA1 message digest for LEN bytes beginning at BUFFER.  The
    result is always in little endian byte order, so that a byte-wise
diff --git a/lib/sha256.c b/lib/sha256.c
index a8d29da..cc1ac3f 100644
--- a/lib/sha256.c
+++ b/lib/sha256.c
@@ -41,9 +41,7 @@
 #endif
 
 #define BLOCKSIZE 32768
-#if BLOCKSIZE % 64 != 0
-# error "invalid BLOCKSIZE"
-#endif
+#include "digest.h"
 
 /* This array contains the bytes used to pad the buffer to the next
    64-byte boundary.  */
@@ -167,145 +165,34 @@ sha224_finish_ctx (struct sha256_ctx *ctx, void *resbuf)
 /* Compute SHA256 message digest for bytes read from STREAM.  The
    resulting message digest number will be written into the 32 bytes
    beginning at RESBLOCK.  */
-int
-sha256_stream (FILE *stream, void *resblock)
+int __flatten
+sha256_stream (FILE *instream, FILE *outstream, void *resblock)
 {
   struct sha256_ctx ctx;
-  size_t sum;
-
-  char *buffer = malloc (BLOCKSIZE + 72);
-  if (!buffer)
-    return 1;
-
-  /* Initialize the computation context.  */
-  sha256_init_ctx (&ctx);
-
-  /* Iterate over full file contents.  */
-  while (1)
-    {
-      /* We read the file in blocks of BLOCKSIZE bytes.  One call of the
-         computation function processes the whole buffer so that with the
-         next round of the loop another block can be read.  */
-      size_t n;
-      sum = 0;
-
-      /* Read block.  Take care for partial reads.  */
-      while (1)
-        {
-          n = fread (buffer + sum, 1, BLOCKSIZE - sum, stream);
-
-          sum += n;
-
-          if (sum == BLOCKSIZE)
-            break;
-
-          if (n == 0)
-            {
-              /* Check for the error flag IFF N == 0, so that we don't
-                 exit the loop after a partial read due to e.g., EAGAIN
-                 or EWOULDBLOCK.  */
-              if (ferror (stream))
-                {
-                  free (buffer);
-                  return 1;
-                }
-              goto process_partial_block;
-            }
-
-          /* We've read at least one byte, so ignore errors.  But always
-             check for EOF, since feof may be true even though N > 0.
-             Otherwise, we could end up calling fread after EOF.  */
-          if (feof (stream))
-            goto process_partial_block;
-        }
-
-      /* Process buffer with BLOCKSIZE bytes.  Note that
-                        BLOCKSIZE % 64 == 0
-       */
-      sha256_process_block (buffer, BLOCKSIZE, &ctx);
-    }
-
- process_partial_block:;
 
-  /* Process any remaining bytes.  */
-  if (sum > 0)
-    sha256_process_bytes (buffer, sum, &ctx);
+  struct digest_funcs funcs = {
+    .init_ctx      = (digest_init_ctx_fn)     sha256_init_ctx,
+    .process_block = (digest_process_data_fn) sha256_process_block,
+    .process_bytes = (digest_process_data_fn) sha256_process_bytes,
+    .finish_ctx    = (digest_finish_ctx_fn)   sha256_finish_ctx
+  };
 
-  /* Construct result in desired memory.  */
-  sha256_finish_ctx (&ctx, resblock);
-  free (buffer);
-  return 0;
+  return digest_stream(instream, outstream, resblock, &ctx, &funcs);
 }
 
-/* FIXME: Avoid code duplication */
-int
-sha224_stream (FILE *stream, void *resblock)
+int __flatten
+sha224_stream (FILE *instream, FILE *outstream, void *resblock)
 {
   struct sha256_ctx ctx;
-  size_t sum;
-
-  char *buffer = malloc (BLOCKSIZE + 72);
-  if (!buffer)
-    return 1;
-
-  /* Initialize the computation context.  */
-  sha224_init_ctx (&ctx);
-
-  /* Iterate over full file contents.  */
-  while (1)
-    {
-      /* We read the file in blocks of BLOCKSIZE bytes.  One call of the
-         computation function processes the whole buffer so that with the
-         next round of the loop another block can be read.  */
-      size_t n;
-      sum = 0;
-
-      /* Read block.  Take care for partial reads.  */
-      while (1)
-        {
-          n = fread (buffer + sum, 1, BLOCKSIZE - sum, stream);
-
-          sum += n;
-
-          if (sum == BLOCKSIZE)
-            break;
-
-          if (n == 0)
-            {
-              /* Check for the error flag IFF N == 0, so that we don't
-                 exit the loop after a partial read due to e.g., EAGAIN
-                 or EWOULDBLOCK.  */
-              if (ferror (stream))
-                {
-                  free (buffer);
-                  return 1;
-                }
-              goto process_partial_block;
-            }
-
-          /* We've read at least one byte, so ignore errors.  But always
-             check for EOF, since feof may be true even though N > 0.
-             Otherwise, we could end up calling fread after EOF.  */
-          if (feof (stream))
-            goto process_partial_block;
-        }
-
-      /* Process buffer with BLOCKSIZE bytes.  Note that
-                        BLOCKSIZE % 64 == 0
-       */
-      sha256_process_block (buffer, BLOCKSIZE, &ctx);
-    }
-
- process_partial_block:;
 
-  /* Process any remaining bytes.  */
-  if (sum > 0)
-    sha256_process_bytes (buffer, sum, &ctx);
+  struct digest_funcs funcs = {
+    .init_ctx      = (digest_init_ctx_fn)     sha224_init_ctx,
+    .process_block = (digest_process_data_fn) sha256_process_block,
+    .process_bytes = (digest_process_data_fn) sha256_process_bytes,
+    .finish_ctx    = (digest_finish_ctx_fn)   sha224_finish_ctx
+  };
 
-  /* Construct result in desired memory.  */
-  sha224_finish_ctx (&ctx, resblock);
-  free (buffer);
-  return 0;
+  return digest_stream(instream, outstream, resblock, &ctx, &funcs);
 }
 
 /* Compute SHA512 message digest for LEN bytes beginning at BUFFER.  The
diff --git a/lib/sha256.h b/lib/sha256.h
index d69b83f..005cf86 100644
--- a/lib/sha256.h
+++ b/lib/sha256.h
@@ -74,8 +74,8 @@ extern void *sha224_read_ctx (const struct sha256_ctx *ctx, void *resbuf);
 /* Compute SHA256 (SHA224) message digest for bytes read from STREAM.  The
    resulting message digest number will be written into the 32 (28) bytes
    beginning at RESBLOCK.  */
-extern int sha256_stream (FILE *stream, void *resblock);
-extern int sha224_stream (FILE *stream, void *resblock);
+extern int sha256_stream (FILE *instream, FILE *outstream, void *resblock);
+extern int sha224_stream (FILE *instream, FILE *outstream, void *resblock);
 
 /* Compute SHA256 (SHA224) message digest for LEN bytes beginning at BUFFER.  The
    result is always in little endian byte order, so that a byte-wise
diff --git a/lib/sha512.c b/lib/sha512.c
index cf62f20..836a600 100644
--- a/lib/sha512.c
+++ b/lib/sha512.c
@@ -48,9 +48,7 @@
 #endif
 
 #define BLOCKSIZE 32768
-#if BLOCKSIZE % 128 != 0
-# error "invalid BLOCKSIZE"
-#endif
+#include "digest.h"
 
 /* This array contains the bytes used to pad the buffer to the next
    128-byte boundary.  */
@@ -175,145 +173,34 @@ sha384_finish_ctx (struct sha512_ctx *ctx, void *resbuf)
 /* Compute SHA512 message digest for bytes read from STREAM.  The
    resulting message digest number will be written into the 64 bytes
    beginning at RESBLOCK.  */
-int
-sha512_stream (FILE *stream, void *resblock)
+int __flatten
+sha512_stream (FILE *instream, FILE *outstream, void *resblock)
 {
   struct sha512_ctx ctx;
-  size_t sum;
-
-  char *buffer = malloc (BLOCKSIZE + 72);
-  if (!buffer)
-    return 1;
-
-  /* Initialize the computation context.  */
-  sha512_init_ctx (&ctx);
-
-  /* Iterate over full file contents.  */
-  while (1)
-    {
-      /* We read the file in blocks of BLOCKSIZE bytes.  One call of the
-         computation function processes the whole buffer so that with the
-         next round of the loop another block can be read.  */
-      size_t n;
-      sum = 0;
-
-      /* Read block.  Take care for partial reads.  */
-      while (1)
-        {
-          n = fread (buffer + sum, 1, BLOCKSIZE - sum, stream);
-
-          sum += n;
-
-          if (sum == BLOCKSIZE)
-            break;
-
-          if (n == 0)
-            {
-              /* Check for the error flag IFF N == 0, so that we don't
-                 exit the loop after a partial read due to e.g., EAGAIN
-                 or EWOULDBLOCK.  */
-              if (ferror (stream))
-                {
-                  free (buffer);
-                  return 1;
-                }
-              goto process_partial_block;
-            }
-
-          /* We've read at least one byte, so ignore errors.  But always
-             check for EOF, since feof may be true even though N > 0.
-             Otherwise, we could end up calling fread after EOF.  */
-          if (feof (stream))
-            goto process_partial_block;
-        }
-
-      /* Process buffer with BLOCKSIZE bytes.  Note that
-                        BLOCKSIZE % 128 == 0
-       */
-      sha512_process_block (buffer, BLOCKSIZE, &ctx);
-    }
-
- process_partial_block:;
 
-  /* Process any remaining bytes.  */
-  if (sum > 0)
-    sha512_process_bytes (buffer, sum, &ctx);
+  struct digest_funcs funcs = {
+    .init_ctx      = (digest_init_ctx_fn)     sha512_init_ctx,
+    .process_block = (digest_process_data_fn) sha512_process_block,
+    .process_bytes = (digest_process_data_fn) sha512_process_bytes,
+    .finish_ctx    = (digest_finish_ctx_fn)   sha512_finish_ctx
+  };
 
-  /* Construct result in desired memory.  */
-  sha512_finish_ctx (&ctx, resblock);
-  free (buffer);
-  return 0;
+  return digest_stream(instream, outstream, resblock, &ctx, &funcs);
 }
 
-/* FIXME: Avoid code duplication */
-int
-sha384_stream (FILE *stream, void *resblock)
+int __flatten
+sha384_stream (FILE *instream, FILE *outstream, void *resblock)
 {
   struct sha512_ctx ctx;
-  size_t sum;
-
-  char *buffer = malloc (BLOCKSIZE + 72);
-  if (!buffer)
-    return 1;
-
-  /* Initialize the computation context.  */
-  sha384_init_ctx (&ctx);
-
-  /* Iterate over full file contents.  */
-  while (1)
-    {
-      /* We read the file in blocks of BLOCKSIZE bytes.  One call of the
-         computation function processes the whole buffer so that with the
-         next round of the loop another block can be read.  */
-      size_t n;
-      sum = 0;
-
-      /* Read block.  Take care for partial reads.  */
-      while (1)
-        {
-          n = fread (buffer + sum, 1, BLOCKSIZE - sum, stream);
-
-          sum += n;
-
-          if (sum == BLOCKSIZE)
-            break;
-
-          if (n == 0)
-            {
-              /* Check for the error flag IFF N == 0, so that we don't
-                 exit the loop after a partial read due to e.g., EAGAIN
-                 or EWOULDBLOCK.  */
-              if (ferror (stream))
-                {
-                  free (buffer);
-                  return 1;
-                }
-              goto process_partial_block;
-            }
-
-          /* We've read at least one byte, so ignore errors.  But always
-             check for EOF, since feof may be true even though N > 0.
-             Otherwise, we could end up calling fread after EOF.  */
-          if (feof (stream))
-            goto process_partial_block;
-        }
-
-      /* Process buffer with BLOCKSIZE bytes.  Note that
-                        BLOCKSIZE % 128 == 0
-       */
-      sha512_process_block (buffer, BLOCKSIZE, &ctx);
-    }
-
- process_partial_block:;
 
-  /* Process any remaining bytes.  */
-  if (sum > 0)
-    sha512_process_bytes (buffer, sum, &ctx);
+  struct digest_funcs funcs = {
+    .init_ctx      = (digest_init_ctx_fn)     sha384_init_ctx,
+    .process_block = (digest_process_data_fn) sha512_process_block,
+    .process_bytes = (digest_process_data_fn) sha512_process_bytes,
+    .finish_ctx    = (digest_finish_ctx_fn)   sha384_finish_ctx
+  };
 
-  /* Construct result in desired memory.  */
-  sha384_finish_ctx (&ctx, resblock);
-  free (buffer);
-  return 0;
+  return digest_stream(instream, outstream, resblock, &ctx, &funcs);
 }
 
 /* Compute SHA512 message digest for LEN bytes beginning at BUFFER.  The
diff --git a/lib/sha512.h b/lib/sha512.h
index ddf91d6..d1fb835 100644
--- a/lib/sha512.h
+++ b/lib/sha512.h
@@ -78,8 +78,8 @@ extern void *sha384_read_ctx (const struct sha512_ctx *ctx, void *resbuf);
 /* Compute SHA512 (SHA384) message digest for bytes read from STREAM.  The
    resulting message digest number will be written into the 64 (48) bytes
    beginning at RESBLOCK.  */
-extern int sha512_stream (FILE *stream, void *resblock);
-extern int sha384_stream (FILE *stream, void *resblock);
+extern int sha512_stream (FILE *instream, FILE *outstream, void *resblock);
+extern int sha384_stream (FILE *instream, FILE *outstream, void *resblock);
 
 /* Compute SHA512 (SHA384) message digest for LEN bytes beginning at BUFFER.  The
    result is always in little endian byte order, so that a byte-wise
-- 
1.7.8.6

[PATCH] enhancement: modify md5sum to allow piping

Reply via email to