Glenn Fowler wrote:
> On Fri, 13 Feb 2009 15:19:30 -0800 Edward Pilatowicz wrote:
> > to reproduce the problem you have to have libumem loaded.
> > i've changed the synopsys to be:
> >       ksh93+libumem+time/ptime is broken in non-C locales
> 
> > to run with libumem, set the following in your environment:
> > ---8<---
> > LD_PRELOAD=libumem.so
> > UMEM_DEBUG='audit=50,guards'
> > UMEM_LOGGING=transaction,fail
> > ---8<---
> 
> > On Fri, Feb 13, 2009 at 11:57:14PM +0100, Roland Mainz wrote:
> > > Stephen Hahn wrote:
> > > > * Roland Mainz <roland.mainz at nrubsig.org> [2009-02-13 22:36]:
> > > > > Edward Pilatowicz wrote:
> > > > > > make sure you're sitting down because this may come as a bit of a 
> > > > > > shock,
> > > > > > but ksh93 seems to break when run in non-C locals with time/ptime.
> > > > > >
> > > > > > 6805584 ksh93 in non-C locale breaks time/ptime
> > > > > [snip]
> > > > >
> > > > > Erm... the bug is not available on
> > > > > http://bugs.opensolaris.org/view_bug.do?bug_id=6805584 yet... ;-(
> > > > >
> > > > > ... what exactly is the problem ? I've tried this:
> > > >
> > > >   From 6805584's description:
> > > >
> > > > edp at jurassic-x4600$ uname -a
> > > > SunOS jurassic-x4600 5.11 snv_108 i86pc i386 i86pc
> > > > edp at jurassic-x4600$ LC_ALL=C /bin/time /bin/sleep 1
> > > >
> > > > real        1.0
> > > > user        0.0
> > > > sys         0.0
> > > > edp at jurassic-x4600$ LC_ALL=en_US.ISO8859-1 /bin/time /bin/sleep 1
> > > > time: command terminated abnormally.
> > > >
> > > > real        1.7
> > > > user        0.0
> > > > sys         0.0
> > > > edp at jurassic-x4600$ pstack core
> > > > core 'core' of 265938:  /usr/bin/ksh93 /bin/sleep 1
> > > >  fffffd7fff3e45aa _lwp_kill () + a
> > > >  fffffd7fff3c4d18 scribble () + c8
> > > >  fffffd7fff3c5115 free () + 2d
> > > >  fffffd7fff3c461d get_lcinterface () + 265
> > > >  fffffd7fff3ce5f2 _ld_libc () + 2a
> > > >  fffffd7fff2d8c6a informrtld () + 4a
> > > >  fffffd7fff2d8283 setlocale () + 8eb
> > > >  fffffd7ffef4f501 single () + e1
> > > >  fffffd7ffef503b8 _ast_setlocale () + 590
> > > >  fffffd7ffef85553 init () + 93
> > > >  fffffd7ffef85714 match () + b4
> > > >  fffffd7ffef858a7 _ast_translate () + 12f
> > > >  fffffd7ffef5c9f0 errorx () + 88
> > > >  fffffd7fff13648b _sh_translate () + 43
> > > >  fffffd7fff0ddcc3 b_common () + 293
> > > >  fffffd7fff0dcf55 b_alias () + 1dd
> > > >  fffffd7fff13ea23 sh_exec () + 2deb
> > > >  fffffd7fff13ca85 sh_exec () + e4d
> > > >  fffffd7fff13d82e sh_exec () + 1bf6
> > > >  fffffd7fff13cc70 sh_exec () + 1038
> > > >  fffffd7fff116e86 exfile () + 786
> > > >  fffffd7fff116676 sh_main () + 7fe
> > > >  0000000000400e72 main () + 52
> > > >  0000000000400ccc ???????? ()
> > >
> > > Looks like a heap corruption (of the non-libast allocator) ... but I
> > > can't reproduce the crash on my B106 VMware machine:
> > > -- snip --
> > > $ uname
> > > -a
> > > SunOS sxb106x86 5.11 snv_106 i86pc i386 i86pc
> > > $
> > > isalist
> > > amd64 pentium_pro+mmx pentium_pro pentium+mmx pentium i486 i386 i86
> > > $ LC_ALL=C /bin/time /bin/sleep 1 ; print $? ; file
> > > core
> > >
> > > real        1.0
> > > user        0.0
> > > sys         0.0
> > > 0
> > > core:           cannot open: No such file or directory
> > > $ LC_ALL=en_US.ISO8859-1 /bin/time /bin/sleep 1 ; print $? ; file
> > > core
> > >
> > > real        1.0
> > > user        0.0
> > > sys         0.0
> > > 0
> > > core:           cannot open: No such file or directory
> > > -- snip --
> > >
> > > What does $ /usr/xpg4/bin/file /usr/bin/sleep /usr/bin/alias # say on
> > > the system where this fails ?
> 
> well not mentioning libumem in the original message was quite an omission

Erm... AFAIK libumem isn't the source of the problem. Solaris's libumem
is an alternative memory allocator which "overrides" the default
|libc::malloc()| and provides configurable debugging aids (in a similar
way as libast's internal memory corruption checks controlled via
VMDEBUG/VMCHECK/&co. - see
http://docs.sun.com/app/docs/doc/816-5168/umem-debug-3malloc?l=ja&a=view
for some documentation). AFAIK Edward was only using libumem to
track-down the source of the problem via libumem and the crash happens
with and without it.

> we were careful in the solaris build to add an _ast_ prefix
> to any libast function that might interfere with solaris libc
> 
> ast provides its own malloc/free, and those calls are mapped
> to _ast_malloc/_ast_free in the ksh/ast code for opensolaris builds
> so that call to free() in the stack trace was not done directly by
> any ksh/ast code

Right... but it seems something has trashed the heap managed by
|libc::malloc()| - either something is writing randomly into areas where
it shouldn't write to... or maybe we hit bug in Solaris.

> is there a description on how libumem allocates/frees physical memory?

Uhm... good question... looking at
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libumem/common/
it seems to have at least support for |sbrk()| and |mmap()|.

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) roland.mainz at nrubsig.org
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 3992797
 (;O/ \/ \O;)

Reply via email to