What compression algorithm should the zipfiles that bean-bake creates use?
I noticed today that zip files created by bean-bake aren't actually
compressed. This appears to be a result of 71abb59ec78f where the reliance
on an external zip program was replaced with the python zipfile module.
The ZipFile constructor has a keyword parameter *compression* with a
default of ZIP_STORED, which means "don't compress". So you need to pass in
a keyword argument to actually compress things. Using compression results
in my baked files going from ~100MB to 15MB.
The tricky part is that python (and zip) support 3 different compression
algorithms and which ones work depend on what modules are installed on the
user system. Sure you can *probably* rely on them all being installed these
days.....
In my diff, we try LZMA first, then fall back to BZIP2, then fall back to
DEFLATE, and finally give up and just use STORED. The Python docs say that
LZMA has been included in the ZIP specification since 2006 and BZIP2 since
2001, so it seems like they should be safe to use at this point....Maybe? I
have no idea idea how widespread support for LZMA/BZIP2 is in zip apps.
So, is the patch fine like this? Should we just use DEFLATE with zip files
and give up hope on using anything better?
*diff -r ccc6dff1b7b4 beancount/scripts/bake.py*
*--- a/beancount/scripts/bake.py Mon Dec 31 18:13:23 2018 +0000*
*+++ b/beancount/scripts/bake.py Fri Jan 04 13:36:57 2019 +0700*
@@ -17,6 +17,15 @@
import re
from os import path
import zipfile
+import importlib
+if importlib.util.find_spec('lzma'):
+ ZIP_COMPRESSION = zipfile.ZIP_LZMA
+elif importlib.util.find_spec('bz2'):
+ ZIP_COMPRESSION = zipfile.ZIP_BZIP2
+elif importlib.util.find_spec('zlib'):
+ ZIP_COMPRESSION = zipfile.ZIP_DEFLATED
+else:
+ ZIP_COMPRESSION = zipfile.ZIP_STORED
import lxml.html
@@ -200,7 +209,7 @@
directory: A string, the name of the directory to archive.
archive: A string, the name of the file to output.
"""
- with file_utils.chdir(directory), zipfile.ZipFile(archive, 'w') as
archfile:
+ with file_utils.chdir(directory), zipfile.ZipFile(archive, 'w',
compression=ZIP_COMPRESSION) as archfile:
for root, dirs, files in os.walk(directory):
for filename in files:
relpath = path.relpath(path.join(root, filename),
directory)
--
You received this message because you are subscribed to the Google Groups
"Beancount" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/beancount/de75671d-532a-4deb-bd0f-fd9377e63753%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.