So, here is the final status.

IGLU now runs kernel 2.2.19-6.2.12 with two slight modifications:
A. MAX_LOOP has been doubled to 32
B. The openwall patches has been applied.

The upshot of B is that certain operations that have hitherto been ok, 
are now not allowed or not possible. These are:
A. Jumping into the stack in any way but "CALL" will segfault the 
program. This hopefully means that buffer overruns will not work while 
trampoline will. For ways to work around this if it breaks something, 
read on.
B. /tmp (or any +t directory) links are followed only if there is a 
match between the link owner and the process owner.
[root@iglu /tmp]# ls -la
total 5
drwxrwxrwt    4 root     root         1024 Feb  9 17:50 .
drwxr-xr-x   19 root     root         1024 Dec 26 19:48 ..
drwxrwxrwt    2 43       43           1024 Aug  9  2000 .font-unix
drwxr-xr-x   28 root     root         2048 Aug 16  2000 5.00503
lrwxrwxrwx    1 sun      sun            14 Feb  9 17:50 testlnk -> 
/home/sun/test
[root@iglu /tmp]# cat testlnk
cat: testlnk: Permission denied
[root@iglu /tmp]#

This operation failed because /tmp is a +t directory, the owner of 
testlnk is "sun", but the process is a root process.

C. Same as B, but with named pipes.
D. The /proc directory and subdirs has a "proc" gid, and a 550 
permissions on some of the directories. This means that users that are 
not members of the "proc" group, cannot view other people's processes. 
Doing "ps ax" from such a user returns only that user's processes. Same 
goes for top, w, and any other /proc related utilitiy. All valid users 
of the machine should add themselves to the proc group, which will bring 
the situation back to normal.

The rest of the changes are not expected to have any side effects.

If some program fails to run because of the non-executable stack (and 
there are some known programs of this type), /usr/local/sbin/chstk is a 
small utility for fixing this. It changes the ELF header of a binary, so 
that stack protection is disabled for that particular binary. Type 
"chstk" for usage info.

If you are interested in experiences from the install process itself, 
here they are. If not, there is no more interesting stuff in this email.

First, I managed to find the saved configurations in the configs 
directory under the kernel-source RPM. I take back my claim that they 
were only available under the SRPM. Sorry RedHat.

I tried to apply the ow4 patches to the kernel. I had several conflicts. 
Looking more closely revealed that RH implemented some of the security 
fixes contained in the OW patches into their kernel. On retrospect, the 
more plausable route was that these patches found their way into 2.2.20, 
and from there RH backported them into 2.2.19. This meant that the OW 
patch against 2.2.19 had conflicts.

I had to manually scan each of these conflicts manually, however, to 
make sure. There was only one place where this was not a simple case of 
manual merging (or totally ignoring). This was in a place where the 
original code casted into int. RH changed that into a cast to size_t, 
while OW changed that into a cast into unsigned long. For all practical 
reasons there is no difference, but I chose the OW solution (though, on 
retrospect, perhaps I shouldn't have).

Once that was done, I changed the MAX_LOOP variable to 32, and started 
the compile. I did "make oldconfig" (to see which config options are 
new), make menuconfig, make dep, make bzImage and make modules. Remeber, 
I compiled this yesturday, while I intended to install this only today. 
Unfortunetly, this didn't work. make modules failed to compile loop.c 
(the loopback support). Mind you, the error was not on any of the lines 
I changed.

Just to make sure I was really not at fault here, I changed the line 
back to 16, and redid "make modules". When that didn't work, I did "make 
clean" "make bzImage" and "make modules" again (remeber - I changed 
MAX_LOOP back). No luck.

The next step was to find out whether changing the MAX_LOOP worked when 
OW was not applied. I reinstalled the kernel-source-2.2.19-6.2.12 RPM, 
changed MAX_LOOP, and redid the sequence. No luck there either. Deep 
shit. That was the stage I said I'll return to this this morning.

I woke up this morning with an ambitious plan. I will add the OW patch 
to the list of patches the SRPM has, and build from there. I will also 
add a patch I'll prepare to bring MAX_LOOP to 32 (actually, I only 
needed to change the RH patch that raised it from 8 to 16). I started to 
work on that, when I realized I am going to run into all the conflicts, 
all over again. Manually removing the patches I applied via OW was no 
better than resolving the conflicts. At that stage I decided to forego 
that option.

I returned to my patched 2.2.19 tree, and ran "make distclean" to delete 
EVERYTHING. This left me with just the sources, already patched. I 
copied the RH config for 686 to .config, and ran "make oldconfig", "make 
menuconfig", "make dep", "make bzImage" and "make modules", and 
everything worked. I don't have the slightest idea why. I copied the 
.config that worked to configs/kernel-2.2.19-iglu.config.

Next came the three promised reboots. Reboot #1 - just as the system was 
then. I did not touch the configuration at all. I wanted to do that to 
make sure that everything worked BEFORE I changed anything, to save time 
on the fixing later. I called Mulix before starting, to make sure there 
was someone in Haifa ready to go to actcom and get things working again. 
After about a minute of accelerated heartbeats, reboot #1 finished 
successfully. I saved dmesg output from that reboot.

Next came the install phase. As I have not built from the SRPM, I didn't 
have an RPM to install. Instead, I copied the kernel over (cp 
/usr/src/linux/arch/i386/boot/bzImage 
/boot/vmlinuz-2.2.19-6.2.12-compiled), as well as the System.map. I then 
changed /etc/lilo.conf, only to find out that I am missing an initrd.

After a quick consultation session with Mulix, he came up with a URL 
(http://xtronics.com/reference/redhat-make-kernel.htm) that explains 
that you can simply remake an initrd image using the mkinitrd utility. 
Another hurdle removed. It turns out I did everything they recommended 
except changing the EXTRAVERSION in the Makefile. Because I had so much 
trouble getting the kernel to compile, I didn't want to do that at this 
stage. I left the default boot kernel as it was before, just added the 
newly compiled kernel as "linux-19.12" label. I then ran "lilo" followed 
by "lilo -R linux-19.12". This tells lilo to use the command line 
"linux-19.12" the next time it starts, but then return to the usual 
behaviour. This is extremely useful when you are upgrading a kernel 
remotely, as if things go wrong, you can simply call support and ask 
them to reboot. After reboot, the computer will resort to a kernel you 
know works. That was another reason I needed reboot #1.

During the first reboot, dmesg showed that the root partition has gone 
too long without fsck, but no fsck was seen (only recommended).  The 
second reboot took well over two and a half minutes, and it was 
worrying. Fortunetly, after those two and a half minutes, the machine 
came up. Everything seemed to work except that /proc didn't use the proc 
group. I tried changing the order the options appear in fstab. At that 
stage I changed lilo.conf to boot into the 19.12 kernel by default, 
installed LILO, and did my third reboot. Everything ok, except /proc 
STILL doesn't allow for a special group.

Reading through the docs again, they mention that some distros, and 
RedHat is one of them, mount proc without using fstab, from the startup 
scripts. A quick grep revealed that, indeed, /etc/rc.d/rc.sysinit does a 
manual proc mount. I added the necessary mount option there as well, and 
did a fourth reboot (you cannot remount proc while the system is up). 
This time, everything was working. Job done. I updated the symlinks for 
/usr/src/linux, /boot/vmlinuz and /boot/System.map, changed motd to 
reming you folks that you can't use w and ps until you add yourselfs to 
the proc group, and that's all folks.

One last thing, though. dmesg shows the following message during boot 
(it did so before I installed my kernel too):
(read) sdc1's sb offset: 17775808 [events: 1e45d056]
md: invalid raid superblock magic on sdc1
md: sdc1 has invalid sb, not importing!

zone 0
 checking sdc1 ... contained as device 0
  (17775888) is smallest!.
 checking sdd1 ... contained as device 1
 checking sde1 ... contained as device 2
 checking sdf1 ... contained as device 3
 zone->nb_dev: 4, size: 71103552
current zone offset: 17775888
zone 1
 checking sdc1 ... nope.
 checking sdd1 ... contained as device 0
  (17781744) is smallest!.
 checking sde1 ... contained as device 1

Is this some problem with the raid?

                Shachar



----------------------------------------------------------------------------
To unsubscribe, send a message to [EMAIL PROTECTED]
Archives available at http://www.mail-archive.com/[email protected]/

Reply via email to