Package: bsdmainutils
Version: 11.1.2
Severity: normal
Tags: patch

Dear Maintainer,

Background: I was trying to use "look -b" on the "have I been
pwned" password database
(https://downloads.pwnedpasswords.com/passwords/pwned-passwords-ordered-2.0.txt.7z)
That file is about 32GiB uncompressed and contains a sorted list
of SHA1 hashes of compromised passwords.

On that file, "look" reports a "file too big" error. That error
is because "look" checks the size of the file against
SIZE_T_MAX. However, SIZE_T_MAX is defined as INT_MAX:

usr.bin/look/Makefile:FLAGS = -include bsd/err.h -DSIZE_T_MAX=INT_MAX

On 64bit systems however (like Debian GNU/Linux on amd64),
size_t (and the size argument passed to mmap()) is generally
64bit (so LONG_MAX), so removing that check (if
((uintmax_t)sb.st_size > (uintmax_t)SIZE_T_MAX) err()) allowed
me to use "look" on that file.

Now, that file is 32GiB uncompressed and 10GiB compressed with
"pixz". And it is possible to do random access on a
pixz-compressed file (a xz file with 16MiB blocks compressed
individually) including mmaping it by using nbdkit and its "xz"
module.

By doing:

sudo nbdkit -n -U - --run 'nbd-client -nofork -u "${nbd#nbd:unix:}" /dev/nbd0' 
xz file=hibp.xz

One can access the uncompressed data via the /dev/nbd0 block
device. And mmap()ing that block device is also working as
expected.

However, "look" uses fstat() to determine the size of the file
(to be passed to mmap()) and on Linux, stat().st_size is 0 for
block devices, so "look" considers them as empty and skips them.

Using lseek(SEEK_END) instead would allow to get the size of the
file (mmappable files are also seekable).

Using the patch below, "look" now can look password hashes
up in under 2 seconds on my system on the pixz-compressed password
hash database.

--- bsdmainutils/usr.bin/look/look.c    2018-03-15 15:45:54.224846742 +0000
+++ bsdmainutils/usr.bin/look/look.c    2018-03-15 15:42:22.444143574 +0000
@@ -104,7 +104,6 @@
 int
 main(int argc, char *argv[])
 {
-       struct stat sb;
        int ch, fd, match;
        wchar_t termchar;
        unsigned char *back, *front;
@@ -152,17 +151,19 @@
        match = 1;
 
        do {
-               if ((fd = open(file, O_RDONLY, 0)) < 0 || fstat(fd, &sb))
+               off_t size;
+               if ((fd = open(file, O_RDONLY, 0)) < 0)
                        err(2, "%s", file);
-               if ((uintmax_t)sb.st_size > (uintmax_t)SIZE_T_MAX)
-                       errx(2, "%s: %s", file, strerror(EFBIG));
-               if (sb.st_size == 0) {
+               if ((size = lseek(fd, 0, SEEK_END)) < 0)
+                       err(2, "%s", file);
+               if (size == 0) {
                        close(fd);
                        continue;
                }
-               if ((front = mmap(NULL, (size_t)sb.st_size, PROT_READ, 
MAP_SHARED, fd, (off_t)0)) == MAP_FAILED)
+               lseek(fd, 0, SEEK_SET);
+               if ((front = mmap(NULL, (size_t)size, PROT_READ, MAP_SHARED, 
fd, (off_t)0)) == MAP_FAILED)
                        err(2, "%s", file);
-               back = front + sb.st_size;
+               back = front + size;
                if (bflag)
                         match *= (look(key, front, back));
                 else

-- System Information:
Debian Release: buster/sid
  APT prefers unstable-debug
  APT policy: (500, 'unstable-debug'), (500, 'testing-debug'), (500, 
'stable-updates'), (500, 'oldstable-updates'), (500, 'testing'), (500, 
'stable'), (500, 'oldstable'), (50, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.14.0-3-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages bsdmainutils depends on:
ii  bsdutils     1:2.31.1-0.4
ii  debianutils  4.8.4
ii  libbsd0      0.8.7-1
ii  libc6        2.27-1
ii  libtinfo5    6.1-1

bsdmainutils recommends no packages.

Versions of packages bsdmainutils suggests:
ii  cpp                         4:7.2.0-1d1
ii  miscfiles [wordlist]        1.5+dfsg-2
pn  vacation                    <none>
ii  wamerican [wordlist]        2017.08.24-1
ii  wbritish [wordlist]         2017.08.24-1
ii  wbritish-insane [wordlist]  2017.08.24-1
ii  wdutch [wordlist]           1:2.10-6
ii  wfrench [wordlist]          1.2.3-11
ii  whois                       5.3.0
ii  wngerman [wordlist]         20161207-4
ii  wnorwegian [wordlist]       2.2-3
ii  wswedish [wordlist]         1.4.5-2.2
ii  wukrainian [wordlist]       1.7.1-2

-- no debconf information

Reply via email to