Package: diffoscope Version: 101 Severity: normal Dear Maintainer,
When comparing two 4.5GB ISO images, diffoscope tries to load them into memory, which fails with MemoryError in json comparator: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/diffoscope/main.py", line 470, in main sys.exit(run_diffoscope(parsed_args)) File "/usr/lib/python3/dist-packages/diffoscope/main.py", line 442, in run_diffoscope difference = compare_root_paths(path1, path2) File "/usr/lib/python3/dist-packages/diffoscope/comparators/utils/compare.py", line 65, in compare_root_paths file1 = specialize(FilesystemFile(path1, container=container1)) File "/usr/lib/python3/dist-packages/diffoscope/comparators/utils/specialize.py", line 49, in specialize if try_recognize(file, cls, cls.recognizes): File "/usr/lib/python3/dist-packages/diffoscope/comparators/utils/specialize.py", line 36, in try_recognize if not recognizes(file): File "/usr/lib/python3/dist-packages/diffoscope/comparators/json.py", line 52, in recognizes f.read().decode('utf-8', errors='ignore'), MemoryError Obviously ISO file is not JSON. The whole thing could be avoided if earlier check (if initial 10 chars contains '[' or '{') would be executed not only on "text" files. Any reasons for that "is_text" there? Alternatively, if is_text=False, maybe the function should return False early? I can provide a patch for either option, but I'd like to know which one of them you prefer. The JSONFile.recognizes function, for context: @classmethod def recognizes(cls, file): with open(file.path, 'rb') as f: # Try fuzzy matching for JSON files is_text = any( file.magic_file_type.startswith(x) for x in ('ASCII text', 'UTF-8 Unicode text'), ) if is_text and not file.name.endswith('.json'): buf = f.read(10) if not any(x in buf for x in b'{['): return False f.seek(0) try: file.parsed = json.loads( f.read().decode('utf-8', errors='ignore'), object_pairs_hook=collections.OrderedDict, ) except ValueError: return False return True -- System Information: Debian Release: buster/sid APT prefers testing APT policy: (500, 'testing') Architecture: amd64 (x86_64) Kernel: Linux 4.14.67-1.pvops.qubes.x86_64 (SMP w/8 CPU cores) Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968), LANGUAGE=C (charmap=ANSI_X3.4-1968) Shell: /bin/sh linked to /usr/bin/dash Init: unable to detect Versions of packages diffoscope depends on: ii libpython3.6-stdlib 3.6.6-1 ii python3 3.6.5-3 ii python3-distro 1.3.0-1 ii python3-distutils 3.6.6-1 ii python3-libarchive-c 2.1-3.1 ii python3-magic 2:0.4.15-2 ii python3-pkg-resources 40.2.0-1 Versions of packages diffoscope recommends: ii abootimg 0.6-1+b2 ii acl 2.2.52-3+b1 pn apktool <none> ii binutils-multiarch 2.31.1-5 ii bzip2 1.0.6-9 ii caca-utils 0.99.beta19-2+b3 ii colord 1.3.3-2 ii db-util 5.3.1 ii default-jdk-headless 2:1.10-68 ii device-tree-compiler 1.4.7-3 ii docx2txt 1.4-1 ii e2fsprogs 1.44.4-2 ii enjarify 1:1.0.3-4 ii fontforge-extras 0.3-4 ii fp-utils 3.0.4+dfsg-20 ii fp-utils-3.0.4 [fp-utils] 3.0.4+dfsg-20 ii genisoimage 9:1.1.11-3+b2 ii gettext 0.19.8.1-7 ii ghc 8.2.2-4 ii ghostscript 9.25~dfsg-2 ii giflib-tools 5.1.4-3 ii gnumeric 1.12.41-1 ii gnupg 2.2.10-1 ii imagemagick 8:6.9.10.8+dfsg-1 ii imagemagick-6.q16 [imagemagick] 8:6.9.10.8+dfsg-1 ii jsbeautifier 1.6.4-7 ii libarchive-tools 3.2.2-5 ii llvm 1:6.0-43 ii lz4 1.8.2-1 ii mono-utils 4.6.2.7+dfsg-1 ii odt2txt 0.5-1+b2 pn oggvideotools <none> ii openssh-client 1:7.8p1-1 ii pgpdump 0.33-1 ii poppler-utils 0.63.0-2 ii procyon-decompiler 0.5.32-4 ii python3-argcomplete 1.8.1-1 ii python3-binwalk 2.1.2~git20180830+dfsg1-1 ii python3-debian 0.1.33 ii python3-defusedxml 0.5.0-1 ii python3-guestfs 1:1.38.4-1 ii python3-jsondiff 1.1.1-2 ii python3-progressbar 2.3-4 ii python3-pyxattr 0.6.0-2+b2 ii python3-tlsh 3.4.4+20151206-1+b4 ii r-base-core 3.5.1-1+b1 ii rpm2cpio 4.14.1+dfsg1-4 ii sng 1.1.0-1+b1 ii sqlite3 3.24.0-1 ii squashfs-tools 1:4.3-6 ii tcpdump 4.9.2-3 ii unzip 6.0-21 ii vim-common 2:8.1.0320-1 ii xmlbeans 2.6.0+dfsg-4 ii xxd 2:8.1.0320-1 ii xz-utils 5.2.2-1.3 Versions of packages diffoscope suggests: ii libjs-jquery 3.2.1-1 -- no debconf information -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing?
signature.asc
Description: PGP signature