In a previous message to the debian-hppa list today, I described how I
finally got a virtual machine successfully created for running Debian
11 on HPPA (aka PA-RISC).

On the same host

        Dell Precision 7920 (1 16-core CPU, 32 hyperthreads,
        2200MHz Intel Xeon Platinum 8253,
        384GB DDR-4 RAM);
        Ubuntu 20.04.02 LTS (Focal Fossa)

I have VMs running with QEMU emulation for Alpha, ARM64, M68K, MIPS32,
MIPS64, RISC-V64, S390x, and SPARC64, and most of them have quite
reasonable interactive performance, making it possible to use the
emacs editor in terminal windows and X11 windows without any serious
response problems.

However, for the new Debian 11 HPPA VM, interactive performance is a
huge issue: shell typein sometimes gets immediate character echo, but
frequently gets delays of 10 to 30 seconds for each input character.
That makes it extremely hard for a fast typist to type commands and
text: one is never sure whether input keys have been dropped.

I develop mathematical software, and a large package that I'm writing
for multiple precision arithmetic provides a testbed for evaluating VM
performance.  Most of the QEMU CPU types support multiple processors,
but M68K and SPARC64 sun4u only permit one CPU.  For HPPA, I have 4 CPUs
and 3GB DRAM; the latter is a hard limit imposed in QEMU source code.

Here is a table of running the equivalent of

        date; make all check ; date

on these systems, using QEMU-6.0.0, unless noted.  Both compilations
and test programs are run in parallel, by internal "make -j" commands.

                                make timing (wall clock)

        Debian 11       Alpha                   07:43:16 -- 08:23:05     39m 49s
        Debian 11       ARM64                   07:58:02 -- 08:24:45     26m 43s
        Debian 11       M68K                    07:43:15 -- 08:30:56     47m 41s
        Debian 11       HPPA                    13:23:16 -- 21:40:19    497m 03s
        Debian 11       HPPA                    07:29:18 -- 18:07:19    638m 
01s [qemu-6.1.0-rc3]
        NetBSD 9.2      HPPA                    11:22:10 -- 01:25:46    843m 36s
        Debian 11       MIPS32                  09:21:49 -- 10:42:41     80m 52s
        Debian 11       SPARC64                 14:45:16 -- 06:19:00    933m 44s
        Debian 11       SPARC64                 17:57:58 -- 04:02:42    603m 
44s [qemu-6.1.0-rc3]
        Ubuntu 18.04    S390x                   18:34:34 -- 19:04:36     30m 02s
        Ubuntu 20.04    S390x                   18:34:35 -- 19:16:54     42m 19s
        FreeBSD 13      RISC-V64                07:41:14 -- 08:34:00     52m 46s
        FreeBSD 14      RISC-V64                08:35:27 -- 09:25:35     50m 08s
        Fedora 34       RISC-V64                07:43:17 -- 08:02:55     19m 38s

>From comparison, here are results on native hardware with local disk
(not NFS, unless indicated) [clock speed in GHz is abbreviated to G]:

        ArchLinux       ARM32                   09:57:34 -- 10:07:43     10m 09s
        Debian 11       UltraSparc T2           08:30:54 -- 08:41:18     10m 24s
        Solaris 10      UltraSparc T2           09:46:31 -- 09:59:32     13m 01s
        Ubuntu 20.04    Xeon 8253               09:34:52 -- 09:35:36      0m 44s
        CentOS 7.9      Xeon E6-1600v3          09:39:00 -- 09:39:42      0m 42s
        CentOS 7.9      Xeon E6-1600v3          10:42:43 -- 10:43:30      0m 
47s [NFS]
        CentOS 7.9      EPYC 7502 2.0G 64C/128T 10:02:01 -- 10:02:27      0m 26s
        CentOS 7.9      EPYC 7502 2.5G 32C/64T  10:02:00 -- 10:02:25      0m 25s

The tests produce about 62,000 total lines of text output, spread over
about 180 files.  They read no input data, and are primarily compute
bound in loops with integer, not floating-point, arithmetic, using
32-bit and 64-bit integer types.

I have generated machine language for representative code from the
hotspot loop using the -S option of gcc and clang, and found that
64-bit arithmetic is expanded inline with 32-bit instructions on
ARM32, HPPA, and M68K, none of which have 64-bit arithmetic
instructions.  The loop instruction counts are comparable across all
of those systems, typically 10 to 20 instructions, compared to 5 or so
on those CPUs that have 64-bit arithmetic.

The dramatic slowdowns on HPPA and SPARC64 are a big surprise, but the
HPPA slowdown matches the poor interactive response.  The SPARC64 VM
is much more responsive interactively, and it DOES have 64-bit integer
arithmetic.

I have not yet done profiling builds of qemu-system-hppa and
qemu-system-sparc64, but that remains an option for further
investigation to find out what is responsible for the slowness.

I can also do profiling builds of parts of my test suite to see
whether there are unexpected hotspots on HPPA and SPARC64 that are
absent on other CPU types.

I have physical SPARC64 hardware running Debian 11 and Solaris 10 on
identical boxes, and have done builds of TeX Live on them with no
difficulty.  However, the slow speed of QEMU HPPA makes it impractical
to try TeX Live builds for Debian 11 HPPA, which is disappointing.

Does any list member have any idea of why QEMU emulation of HPPA and
SPARC64 is so bad?  Are there Debian kernel parameters that might be
tweaked?  Have any of you used Debian on QEMU HPPA and seen similar
slowness compared to other CPU types?

Notice from my first table above that NetBSD 9.2 on HPPA is also very
slow, which tends to point the finger at QEMU as the source of the
dismal performance, rather than the VM guest O/S.

For the record, here is how QEMU releases downloaded from

        https://www.qemu.org/
        https://download.qemu.org/

are built here, taking the most recent QEMU release for the sample:

        tar xf $prefix/src/qemu/qemu-6.1.0-rc3.tar.xz
        cd qemu-6.1.0-rc3
        unsetenv CONFIG_SITE
        mkdir build
        cd build
        env CC=cc CFLAGS=-O2 ../configure --prefix=$prefix && make all -j && 
make check

QEMU builds require prior installation of the ninja-build package
available on major GNU/Linux distributions.  On completion, the needed
qemu-system-xxx executables are present in the build subdirectory.

On Ubuntu 20.04, the QEMU builds are clean, and pass the entire
validation suite without any failures.

-------------------------------------------------------------------------------
- Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
- University of Utah                    FAX: +1 801 581 4148                  -
- Department of Mathematics, 110 LCB    Internet e-mail: [email protected]  -
- 155 S 1400 E RM 233                       [email protected]  [email protected] -
- Salt Lake City, UT 84112-0090, USA    URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------

Reply via email to