Turn the existing nbdkit-gzip-plugin into a filter so it can be applied on top of files or other sources:
nbdkit file --filter=gzip file.gz nbdkit curl --filter=gzip https://example.com/disk.gz Because of the nature of the gzip format which is not blocked based and thus not seekable, this filter caches the whole uncompressed file in a hidden temporary file. This is required in order to implement .get_size. See this link for a more detailed explanation: https://stackoverflow.com/a/9213826 This commit deprecates nbdkit-gzip-plugin and suggests removal in nbdkit 1.26. --- filters/gzip/nbdkit-gzip-filter.pod | 85 +++++++ filters/tar/nbdkit-tar-filter.pod | 7 +- plugins/gzip/nbdkit-gzip-plugin.pod | 9 + configure.ac | 10 +- filters/gzip/Makefile.am | 75 ++++++ tests/Makefile.am | 34 +-- filters/gzip/gzip.c | 347 ++++++++++++++++++++++++++++ tests/test-gzip.c | 4 +- TODO | 2 - 9 files changed, 547 insertions(+), 26 deletions(-) diff --git a/filters/gzip/nbdkit-gzip-filter.pod b/filters/gzip/nbdkit-gzip-filter.pod new file mode 100644 index 00000000..da0cf626 --- /dev/null +++ b/filters/gzip/nbdkit-gzip-filter.pod @@ -0,0 +1,85 @@ +=head1 NAME + +nbdkit-gzip-filter - decompress a .gz file + +=head1 SYNOPSIS + + nbdkit file --filter=gzip FILENAME.gz + +=head1 DESCRIPTION + +C<nbdkit-gzip-filter> is a filter for L<nbdkit(1)> which transparently +decompresses a gzip-compressed file. You can place this filter on top +of L<nbdkit-file-plugin(1)> to decompress a local F<.gz> file, or on +top of other plugins such as L<nbdkit-curl-plugin(1)>: + + nbdkit curl --filter=gzip https://example.com/disk.gz + +With L<nbdkit-tar-filter(1)> it can be used to extract files from a +compressed tar file: + + nbdkit curl --filter=tar --filter=gzip \ + https://example.com/file.tar.gz tar-entry=disk.img + +The filter only allows read-only connections. + +B<Note> that gzip files are not very good for random access in large +files because seeking to a position in the gzip file involves +decompressing all data before that point in the file. A more +practical method to compress large disk images is to use the L<xz(1)> +format and L<nbdkit-xz-filter(1)>. + +To allow seeking this filter has to keep the contents of the complete +uncompressed file, which it does in a hidden temporary file under +C<$TMPDIR>. + +=head1 PARAMETERS + +There are no parameters specific to this plugin. + +=head1 ENVIRONMENT VARIABLES + +=over 4 + +=item C<TMPDIR> + +Because the gzip format is not seekable, this filter has to store the +complete contents of the compressed file in a temporary file located +in F</var/tmp> by default. You can override this location by setting +the C<TMPDIR> environment variable before starting nbdkit. + +=back + +=head1 FILES + +=over 4 + +=item F<$plugindir/nbdkit-gzip-plugin.so> + +The plugin. + +Use C<nbdkit --dump-config> to find the location of C<$plugindir>. + +=back + +=head1 VERSION + +C<nbdkit-gzip-filter> first appeared in nbdkit 1.22. It is derived +from C<nbdkit-gzip-plugin> which first appeared in nbdkit 1.0. + +=head1 SEE ALSO + +L<nbdkit-curl-plugin(1)>, +L<nbdkit-file-plugin(1)>, +L<nbdkit-tar-filter(1)>, +L<nbdkit-xz-filter(1)>, +L<nbdkit(1)>, +L<nbdkit-plugin(3)>. + +=head1 AUTHORS + +Richard W.M. Jones + +=head1 COPYRIGHT + +Copyright (C) 2013-2020 Red Hat Inc. diff --git a/filters/tar/nbdkit-tar-filter.pod b/filters/tar/nbdkit-tar-filter.pod index 56d4cab1..0f0734c3 100644 --- a/filters/tar/nbdkit-tar-filter.pod +++ b/filters/tar/nbdkit-tar-filter.pod @@ -42,11 +42,13 @@ server use: nbdkit -r curl https://example.com/file.tar \ --filter=tar tar-entry=disk.img -=head2 Open an xz-compressed tar file (read-only) +=head2 Open an gzip-compressed tar file (read-only) This filter cannot handle compressed tar files itself, but you can -combine it with L<nbdkit-xz-filter(1)>: +combine it with L<nbdkit-gzip-filter(1)> or L<nbdkit-xz-filter(1)>: + nbdkit file filename.tar.gz \ + --filter=tar tar-entry=disk.img --filter=gzip nbdkit file filename.tar.xz \ --filter=tar tar-entry=disk.img --filter=xz @@ -100,6 +102,7 @@ from C<nbdkit-tar-plugin> which first appeared in nbdkit 1.2. L<nbdkit(1)>, L<nbdkit-curl-plugin(1)>, L<nbdkit-file-plugin(1)>, +L<nbdkit-gzip-filter(1)>, L<nbdkit-offset-filter(1)>, L<nbdkit-plugin(3)>, L<nbdkit-ssh-plugin(1)>, diff --git a/plugins/gzip/nbdkit-gzip-plugin.pod b/plugins/gzip/nbdkit-gzip-plugin.pod index 1b090125..4cd91ede 100644 --- a/plugins/gzip/nbdkit-gzip-plugin.pod +++ b/plugins/gzip/nbdkit-gzip-plugin.pod @@ -6,6 +6,15 @@ nbdkit-gzip-plugin - nbdkit gzip plugin nbdkit gzip [file=]FILENAME.gz +=head1 DEPRECATED + +B<The gzip plugin is deprecated in S<nbdkit E<ge> 1.22.17> and will be +removed in S<nbdkit 1.26>>. It has been replaced with a filter with +the same functionality, see L<nbdkit-gzip-filter(1)>. You can use the +filter like this: + + nbdkit file --filter=gzip FILENAME.gz + =head1 DESCRIPTION C<nbdkit-gzip-plugin> is a file serving plugin for L<nbdkit(1)>. diff --git a/configure.ac b/configure.ac index b51b67b6..3c1f2e11 100644 --- a/configure.ac +++ b/configure.ac @@ -105,6 +105,7 @@ filters="\ ext2 \ extentlist \ fua \ + gzip \ ip \ limit \ log \ @@ -899,10 +900,10 @@ AS_IF([test "$with_libvirt" != "no"],[ ]) AM_CONDITIONAL([HAVE_LIBVIRT],[test "x$LIBVIRT_LIBS" != "x"]) -dnl Check for zlib (only if you want to compile the gzip plugin). +dnl Check for zlib (only if you want to compile the gzip filter). AC_ARG_WITH([zlib], [AS_HELP_STRING([--without-zlib], - [disable gzip plugin @<:@default=check@:>@])], + [disable gzip filter @<:@default=check@:>@])], [], [with_zlib=check]) AS_IF([test "$with_zlib" != "no"],[ @@ -911,7 +912,7 @@ AS_IF([test "$with_zlib" != "no"],[ AC_SUBST([ZLIB_LIBS]) AC_DEFINE([HAVE_ZLIB],[1],[zlib found at compile time.]) ], - [AC_MSG_WARN([zlib >= 1.2.3.5 not found, gzip plugin will be disabled])]) + [AC_MSG_WARN([zlib >= 1.2.3.5 not found, gzip filter will be disabled])]) ]) AM_CONDITIONAL([HAVE_ZLIB],[test "x$ZLIB_LIBS" != "x"]) @@ -1144,6 +1145,7 @@ AC_CONFIG_FILES([Makefile filters/ext2/Makefile filters/extentlist/Makefile filters/fua/Makefile + filters/gzip/Makefile filters/ip/Makefile filters/limit/Makefile filters/log/Makefile @@ -1253,6 +1255,8 @@ echo "Optional filters:" echo feature "ext2 ................................... " \ test "x$HAVE_EXT2_TRUE" = "x" +feature "gzip ................................... " \ + test "x$HAVE_ZLIB_TRUE" = "x" feature "xz ..................................... " \ test "x$HAVE_LIBLZMA_TRUE" = "x" diff --git a/filters/gzip/Makefile.am b/filters/gzip/Makefile.am new file mode 100644 index 00000000..a329fab8 --- /dev/null +++ b/filters/gzip/Makefile.am @@ -0,0 +1,75 @@ +# nbdkit +# Copyright (C) 2019-2020 Red Hat Inc. +# +# Redistribution and use in source and binary forms, with or without +# modification, are permitted provided that the following conditions are +# met: +# +# * Redistributions of source code must retain the above copyright +# notice, this list of conditions and the following disclaimer. +# +# * Redistributions in binary form must reproduce the above copyright +# notice, this list of conditions and the following disclaimer in the +# documentation and/or other materials provided with the distribution. +# +# * Neither the name of Red Hat nor the names of its contributors may be +# used to endorse or promote products derived from this software without +# specific prior written permission. +# +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +# SUCH DAMAGE. + +include $(top_srcdir)/common-rules.mk + +EXTRA_DIST = nbdkit-gzip-filter.pod + +if HAVE_ZLIB + +filter_LTLIBRARIES = nbdkit-gzip-filter.la + +nbdkit_gzip_filter_la_SOURCES = \ + gzip.c \ + $(top_srcdir)/include/nbdkit-filter.h \ + $(NULL) + +nbdkit_gzip_filter_la_CPPFLAGS = \ + -I$(top_srcdir)/include \ + -I$(top_srcdir)/common/include \ + -I$(top_srcdir)/common/utils \ + $(NULL) +nbdkit_gzip_filter_la_CFLAGS = \ + $(WARNINGS_CFLAGS) \ + $(ZLIB_CFLAGS) \ + $(NULL) +nbdkit_gzip_filter_la_LIBADD = \ + $(top_builddir)/common/utils/libutils.la \ + $(ZLIB_LIBS) \ + $(NULL) +nbdkit_gzip_filter_la_LDFLAGS = \ + -module -avoid-version -shared $(SHARED_LDFLAGS) \ + -Wl,--version-script=$(top_srcdir)/filters/filters.syms \ + $(NULL) + +if HAVE_POD + +man_MANS = nbdkit-gzip-filter.1 +CLEANFILES += $(man_MANS) + +nbdkit-gzip-filter.1: nbdkit-gzip-filter.pod + $(PODWRAPPER) --section=1 --man $@ \ + --html $(top_builddir)/html/[email protected] \ + $< + +endif HAVE_POD + +endif diff --git a/tests/Makefile.am b/tests/Makefile.am index 2b5737b8..77f21d79 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -557,23 +557,6 @@ EXTRA_DIST += test-floppy.sh TESTS += test-full.sh EXTRA_DIST += test-full.sh -# gzip plugin test. -if HAVE_MKE2FS_WITH_D -if HAVE_ZLIB -LIBGUESTFS_TESTS += test-gzip -check_DATA += disk.gz -CLEANFILES += disk.gz - -test_gzip_SOURCES = test-gzip.c test.h -test_gzip_CFLAGS = $(WARNINGS_CFLAGS) $(LIBGUESTFS_CFLAGS) -test_gzip_LDADD = libtest.la $(LIBGUESTFS_LIBS) - -disk.gz: disk - rm -f $@ - gzip -9 -c disk > $@ -endif HAVE_ZLIB -endif HAVE_MKE2FS_WITH_D - # info plugin test. TESTS += \ test-info-address.sh \ @@ -1253,6 +1236,23 @@ EXTRA_DIST += test-extentlist.sh TESTS += test-fua.sh EXTRA_DIST += test-fua.sh +# gzip filter test. +if HAVE_MKE2FS_WITH_D +if HAVE_ZLIB +LIBGUESTFS_TESTS += test-gzip +check_DATA += disk.gz +CLEANFILES += disk.gz + +test_gzip_SOURCES = test-gzip.c test.h +test_gzip_CFLAGS = $(WARNINGS_CFLAGS) $(LIBGUESTFS_CFLAGS) +test_gzip_LDADD = libtest.la $(LIBGUESTFS_LIBS) + +disk.gz: disk + rm -f $@ + gzip -9 -c disk > $@ +endif HAVE_ZLIB +endif HAVE_MKE2FS_WITH_D + # ip filter test. TESTS += test-ip-filter.sh EXTRA_DIST += test-ip-filter.sh diff --git a/filters/gzip/gzip.c b/filters/gzip/gzip.c new file mode 100644 index 00000000..582652cd --- /dev/null +++ b/filters/gzip/gzip.c @@ -0,0 +1,347 @@ +/* nbdkit + * Copyright (C) 2018-2020 Red Hat Inc. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are + * met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * * Neither the name of Red Hat nor the names of its contributors may be + * used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, + * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A + * PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR + * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF + * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, + * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT + * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#include <config.h> + +#include <stdio.h> +#include <stdlib.h> +#include <stdint.h> +#include <inttypes.h> +#include <string.h> +#include <fcntl.h> +#include <errno.h> +#include <pthread.h> + +#include <zlib.h> + +#include <nbdkit-filter.h> + +#include "cleanup.h" +#include "minmax.h" + +/* The first thread to call gzip_prepare has to uncompress the whole + * plugin to the temporary file. This lock prevents concurrent + * access. + */ +static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER; + +/* Temporary file storing the uncompressed data. */ +static int fd = -1; + +/* Size of uncompressed data. */ +static int64_t size = -1; + +static void +gzip_unload (void) +{ + if (fd >= 0) + close (fd); +} + +static int +gzip_thread_model (void) +{ + return NBDKIT_THREAD_MODEL_PARALLEL; +} + +static void * +gzip_open (nbdkit_next_open *next, nbdkit_backend *nxdata, int readonly) +{ + /* Always pass readonly=1 to the underlying plugin. */ + if (next (nxdata, 1) == -1) + return NULL; + + return NBDKIT_HANDLE_NOT_NEEDED; +} + +/* Convert a zlib error (always negative) to an nbdkit error message, + * and return errno correctly. + */ +static void +zerror (const char *op, const z_stream *strm, int zerr) +{ + if (zerr == Z_MEM_ERROR) { + errno = ENOMEM; + nbdkit_error ("gzip: %s: %m", op); + } + else { + errno = EIO; + if (strm->msg) + nbdkit_error ("gzip: %s: %s", op, strm->msg); + else + nbdkit_error ("gzip: %s: unknown error: %d", op, zerr); + } +} + +/* Write a whole buffer to the temporary file or fail. */ +static int +xwrite (const void *buf, size_t count) +{ + ssize_t r; + + while (count > 0) { + r = write (fd, buf, count); + if (r == -1) { + nbdkit_error ("write: %m"); + return -1; + } + buf += r; + count -= r; + } + + return 0; +} + +/* The first thread to call gzip_prepare uncompresses the whole plugin. */ +static int +do_uncompress (struct nbdkit_next_ops *next_ops, void *nxdata) +{ + int64_t compressed_size; + z_stream strm; + int zerr; + const char *tmpdir; + size_t len; + char *template; + CLEANUP_FREE char *in_block = NULL, *out_block = NULL; + + /* This was the same buffer size as used in the old plugin. As far + * as I know it was chosen at random. + */ + const size_t block_size = 128 * 1024; + + assert (size == -1); + + /* Get the size of the underlying plugin. */ + compressed_size = next_ops->get_size (nxdata); + if (compressed_size == -1) + return -1; + + /* Create the temporary file. */ + tmpdir = getenv ("TMPDIR"); + if (!tmpdir) + tmpdir = LARGE_TMPDIR; + + len = strlen (tmpdir) + 8; + template = alloca (len); + snprintf (template, len, "%s/XXXXXX", tmpdir); + +#ifdef HAVE_MKOSTEMP + fd = mkostemp (template, O_CLOEXEC); +#else + /* This is only invoked serially with the lock held, so this is safe. */ + fd = mkstemp (template); + if (fd >= 0) { + fd = set_cloexec (fd); + if (fd < 0) { + int e = errno; + unlink (template); + errno = e; + } + } +#endif + if (fd == -1) { + nbdkit_error ("mkostemp: %s: %m", tmpdir); + return -1; + } + + unlink (template); + + /* Uncompress the whole plugin. This is REQUIRED in order to + * implement gzip_get_size. See: https://stackoverflow.com/a/9213826 + * + * For use of inflateInit2 on gzip streams see: + * https://stackoverflow.com/a/1838702 + */ + memset (&strm, 0, sizeof strm); + zerr = inflateInit2 (&strm, 16+MAX_WBITS); + if (zerr != Z_OK) { + zerror ("inflateInit2", &strm, zerr); + return -1; + } + + in_block = malloc (block_size); + if (!in_block) { + nbdkit_error ("malloc: %m"); + return -1; + } + out_block = malloc (block_size); + if (!out_block) { + nbdkit_error ("malloc: %m"); + return -1; + } + + for (;;) { + /* Do we need to read more from the plugin? */ + if (strm.avail_in == 0 && strm.total_in < compressed_size) { + size_t n = MIN (block_size, compressed_size - strm.total_in); + int err = 0; + + if (next_ops->pread (nxdata, in_block, (uint32_t) n, strm.total_in, + 0, &err) == -1) { + errno = err; + return -1; + } + + strm.next_in = (void *) in_block; + strm.avail_in = n; + } + + /* Inflate the next chunk of input. */ + strm.next_out = (void *) out_block; + strm.avail_out = block_size; + zerr = inflate (&strm, Z_SYNC_FLUSH); + if (zerr < 0) { + zerror ("inflate", &strm, zerr); + return -1; + } + + /* Write the output to the file. */ + if (xwrite (out_block, (char *) strm.next_out - out_block) == -1) + return -1; + + if (zerr == Z_STREAM_END) + break; + } + + /* Set the size to the total uncompressed size. */ + size = strm.total_out; + nbdkit_debug ("gzip: uncompressed size: %" PRIi64, size); + + zerr = inflateEnd (&strm); + if (zerr != Z_OK) { + zerror ("inflateEnd", &strm, zerr); + return -1; + } + + return 0; +} + +static int +gzip_prepare (struct nbdkit_next_ops *next_ops, void *nxdata, void *handle, + int readonly) +{ + ACQUIRE_LOCK_FOR_CURRENT_SCOPE (&lock); + + if (size >= 0) + return 0; + return do_uncompress (next_ops, nxdata); +} + +/* Whatever the plugin says, this filter makes it read-only. */ +static int +gzip_can_write (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle) +{ + return 0; +} + +/* Similar to above, whatever the plugin says, extents are not + * supported. + */ +static int +gzip_can_extents (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle) +{ + return 0; +} + +/* We are already operating as a cache regardless of the plugin's + * underlying .can_cache, but it's easiest to just rely on nbdkit's + * behavior of calling .pread for caching. + */ +static int +gzip_can_cache (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle) +{ + return NBDKIT_CACHE_EMULATE; +} + +/* Get the file size. */ +static int64_t +gzip_get_size (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle) +{ + /* This must be true because gzip_prepare must have been called. */ + assert (size >= 0); + + /* We must call underlying get_size even though we don't use the + * result, because it caches the plugin size in server/backend.c. + */ + if (next_ops->get_size (nxdata) == -1) + return -1; + + return size; +} + +/* Read data from the temporary file. */ +static int +gzip_pread (struct nbdkit_next_ops *next_ops, void *nxdata, + void *handle, void *buf, uint32_t count, uint64_t offset, + uint32_t flags, int *err) +{ + /* This must be true because gzip_prepare must have been called. */ + assert (fd >= 0); + + while (count > 0) { + ssize_t r = pread (fd, buf, count, offset); + if (r == -1) { + nbdkit_error ("pread: %m"); + return -1; + } + if (r == 0) { + nbdkit_error ("pread: unexpected end of file"); + return -1; + } + buf += r; + count -= r; + offset += r; + } + + return 0; +} + +static struct nbdkit_filter filter = { + .name = "gzip", + .longname = "nbdkit gzip filter", + .unload = gzip_unload, + .thread_model = gzip_thread_model, + .open = gzip_open, + .prepare = gzip_prepare, + .can_write = gzip_can_write, + .can_extents = gzip_can_extents, + .can_cache = gzip_can_cache, + .prepare = gzip_prepare, + .get_size = gzip_get_size, + .pread = gzip_pread, +}; + +NBDKIT_REGISTER_FILTER(filter) diff --git a/tests/test-gzip.c b/tests/test-gzip.c index 9e1229e1..969d6d0e 100644 --- a/tests/test-gzip.c +++ b/tests/test-gzip.c @@ -1,5 +1,5 @@ /* nbdkit - * Copyright (C) 2013 Red Hat Inc. + * Copyright (C) 2013-2020 Red Hat Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are @@ -50,7 +50,7 @@ main (int argc, char *argv[]) int r; char *data; - if (test_start_nbdkit ("gzip", "-r", "file=disk.gz", NULL) == -1) + if (test_start_nbdkit ("file", "--filter=gzip", "disk.gz", NULL) == -1) exit (EXIT_FAILURE); g = guestfs_create (); diff --git a/TODO b/TODO index 28bcc952..addf8025 100644 --- a/TODO +++ b/TODO @@ -169,8 +169,6 @@ Rust: Suggestions for filters ----------------------- -* gzip plugin should really be a filter - * LUKS encrypt/decrypt filter, bonus points if compatible with qemu LUKS-encrypted disk images -- 2.27.0 _______________________________________________ Libguestfs mailing list [email protected] https://www.redhat.com/mailman/listinfo/libguestfs
