I have been having fun, working on the Ben after not doing anything
for some months. Here are some highlights:
1. I noticed that there was a gcc-mips compiler option in the config
of the new toolchain, so I added it to my image.
2. A recent post on the web found that ext4 was often significantly
faster in writing and reading on solid-state devices. So I did a "make
kernel_menuconfig" and enabled ext4.
3. My 8 GB Transcend Micro SD card was divided into three partitions:
about 500 MB on p1 using ext2, for the rootfs, about 70 MB on p2 for
swap, and the remainder on p3 using ext4 for my main projects
partition.
4. I changed /etc/inittab so that gmenu2x never gets launched: I
prefer using the command line on such a small device, at least for my
own work.
5. To enable swap and also to mount p3 on the card, I manually
modified /etc/config/fstab to look like this:
config global automount
option from_fstab 1
option anon_mount 1
config global autoswap
option from_fstab 1
option anon_swap 0
config mount
option target /pj
option device /dev/mmcblk0p3
option fstype ext4
option options rw,sync
option enabled 1
option enabled_fsck 0
config swap
option device /dev/mmcblk0p2
option enabled 1
I know that the uci system exists for editing the config files but for
the most part, I find it easier to just go in and change them
directly:)
The dev options look a bit "strange" but that is how Linux "talks" to
devices like SD cards:) There is only one slot on the Ben so it gets
the number "0" and the partitions are numbered, starting at 1,
and prefixed with "p", for "partition".
6. To test gcc, I found and downloaded a C-language version of the
long-used linpack benchmark for floating-point performance. It has
been in use for some decades and has a published list going back to
computers from the 1970's at least, if not a bit earlier. The
collection of routines in this program has about 1200 lines of code,
white space and comments. Not very large but yet of interest to me.
It compiled the first time with just
gcc linpack_bench.c
and took about 20 seconds. The program was set to solve a 1000
equation linear system--that took close to 9 minutes to finish, so I
reduced the equation count to 100 and then got times close to 1
second! However, the timer used in the code had a resolution of 0.01
second, so I upped the size to 200 with a runtime close to 8 seconds.
This reduced the timing noise and the Mflops reported are the average
over five runs. Here is a summary of what I found by experimenting
with gcc options a bit:
Options used Mflops Compile(secs) a.out size(bytes)
----------------------------- ------ -------------- -----------------
none 0.710 17 332878
-Os 0.743 35 327136
-Os -march=mips2 0.750 33 327136
-0s -march=mips32 0.745 33 327136
Observations:
a. The optimizations do make a small difference. I noted that the mips32
option is used in the compilation of the toolchain as is Os. O1, O2,
O3 did give slightly poorer results than Os. Thus seeking to make
a.out smaller makes it a bit faster for this program.
b. Adding other options to Os did not change the size of a.out but
did make tiny differences in the Mflops. Based on using five
observations and the pattern they followed, the differences could be
real, even though small, but maybe not. I didn't try to do a byte
compare on the a.out files:)
c. Looking at the Mflops tabulated for the linpack benchmark, reveals
that the Ben, for a single double-precision floating point
application, is only slightly slower than the IBM 370/165 mainframe
from the 1970's, which showed 0.78 Mflops! My current desktop, using
an Intel i7-860 gets more than 1600 Mflops on this program. So the
Ben is slow relative to what we have today but it is fast and has
large RAM compared to my early days in computing when RAM was measured
in KB not MB. GB was not even in the language yet:)
7. Finally, I wanted to see when swap would come into play.
Compiling the benchmark did not even come close to showing any swap
using htop. Therefore, I recompiled the benchmark for a 1000-equation
system. I than started htop in one of the four terminals and added,
one by one, another instance of the benchmark program in the remaining
three terminals. No swap with one instance, no swap with two
instances, but on the third, swap went up to as high as 20 MB but then
settled down to around 7 MB as the Ben was "working its heart out" on
three instances of a solution that could not have been done on any
computer in the world in the 1960's! It was quite apparent that lots
of time was being used in moving items between swap and RAM because
the aggregate CPU percentage for the three benchmark processes was
below 50 percent. Killing one benchmark process resulted in the CPU
usage jumping to about 95 percent for the two remaining benchmark
runs.
Swap works but it would be best to avoid it because it radically slows
computation--but then we all knew that:)
This E-mail is a bit long, but I wanted to report on what I found,
since nothing has appeared on these topics yet. Now I have to get
back to "real" project work:)
Delbert
_______________________________________________
Qi Hardware Discussion List
Mail to list (members only): [email protected]
Subscribe or Unsubscribe:
http://lists.en.qi-hardware.com/mailman/listinfo/discussion