Bug#167409: XEmacs 21.4.10 crashes with glibc 2.3.1

Wolfram Gloger Thu, 07 Nov 2002 09:03:19 -0800

Hi,

> Thanks you very much for the review and the patch!  The patch "works"
> in the sense that I can now build.


Ok, good.

> I'll run with it for a while, put it in the next release candidate,
> and see how it affects XEmacs's footprint.  Probably it will go in the
> next release of 21.4, safety over efficiency.  But I would appreciate
> it if you could help us recover the mmap capabilities.

Yes, I'll give it a try when I get some time.

>     Wolfram> N.B. the exact same problem should/could show up with
>     Wolfram> earlier glibc releases, but maybe the allocation pattern
>     Wolfram> was slightly different.
> 
> I don't understand the problem so I'm not sure what you're saying.
> 
> First, your patch also affects "portable dumper" builds which build
> and (mostly) run fine.  Is this intentional?  Ie, is this a generic
> problem with our allocator implementation, which "just happened" to
> manifest dramatically only in unexec builds on very recent glibcs?

That's quite probable.  You see, the problem is Lisp's tagged
pointers.  Pointers to Lisp objects are "coloured" in their most
significant bits (I think 3 bits) with type information.  Therefore,
when the malloc implementation hands out chunks with one of those high
bits already set, you get a clash.  This can indeed happen with
glibc's malloc on Linux, because it hands out mmapped chunks (they
start near 0x40000000 on ix86-linux), but generally only for "large"
allocations.  (You had the M_MMAP_THRESHOLD set to 64k, a reasonable
choice IMHO; one Lisp vector allocation from the "temacs -dump" run
just exceeded that threshold.)  By setting M_MMAP_MAX to 0 I've
disabled all use of mmap; glibc's malloc behaves more like a classic
malloc then.

For GNU Emacs, I've added temporary switches of M_MMAP_MAX to 0 and
back _only_ in the Lisp object allocation paths (there were about half
a dozen places); I'll try to do the same for XEmacs when I find the
time.

> Second, if it's a generic problem, is it possible that it would
> generate GCPRO-bug-like symptoms (ie, weird crashes in "obviously
> correct" code because data that we know is correctly initialized
> mysteriously changes)?  For example, we fixed a couple of GCPRO bugs
> recently, but we're still seeing mysterious "illegal bytecode" crashes
> (especially in Gnus), although fewer of them :-).  We're pretty sure
> the bytecompiler isn't responsible for this, because we've checked the
> code in memory.

Not sure about this, but I suspect that "mysteriously changing" memory
cannot occur due to this.  The chance that a coloured pointer is
masked (by stripping off the top bits) into a valid memory region
seems quite small to me.  But it's certainly not impossible.

> Third, is there still a possible problem if we use
> --with-system-malloc?  Ie, we use the Doug Lea malloc from glibc,
> but do no mallopt tweaking.

Yes, there is the same problem, AFAICS.  IMHO the autoconf test should
check whether mallopt(M_MMAP_MAX) is available, and deduce that it's a
variant of Doug Lea malloc.

Regards,
Wolfram.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Bug#167409: XEmacs 21.4.10 crashes with glibc 2.3.1

Reply via email to