Re: [OmniOS-discuss] LX: real ksh93 broken

2017-05-11 Thread Michael Rasmussen
On Thu, 11 May 2017 11:11:14 +0200
Ludovic Orban  wrote:

> 
> Apparently, ksh isn't very happy when CHILD_MAX equals to MAX_INT, but
> that's probably a ksh bug.
> 
Could it be ksh interprets int as it was defined in the 32bit OS days
int = 16bit = INT_MAX 32767 and expecting to receive a signed
int?

This is obviously wrong since ISO/IEC 9899 only requires an int to be
at least 2^16 - 1. INT_MAX is defined in limits.h

-- 
Hilsen/Regards
Michael Rasmussen

Get my public GnuPG keys:
michael  rasmussen  cc
http://pgp.mit.edu:11371/pks/lookup?op=get=0xD3C9A00E
mir  datanom  net
http://pgp.mit.edu:11371/pks/lookup?op=get=0xE501F51C
mir  miras  org
http://pgp.mit.edu:11371/pks/lookup?op=get=0xE3E80917
--
/usr/games/fortune -es says:
Witch!  Witch!  They'll burn ya!
-- Hag, "Tomorrow is Yesterday", stardate unknown


pgpXyM97rUsOb.pgp
Description: OpenPGP digital signature
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] LX: real ksh93 broken

2017-05-11 Thread Dan McDonald

> On May 11, 2017, at 5:11 AM, Ludovic Orban  wrote:
> 
> If my understanding of the LX code is correct, sysconf(_SC_CHILD_MAX) ends up 
> being translated to lx_getrlimit() which would return the value of 
> zone.max-lwps. Looks like an odd default to me, but I can't say for sure. 
> Since I haven't configured any rctl on my lx zone, apparently the default is 
> MAX_INT. I assume smartos uses a different default, but I wish I could 
> double-check that.
> 
> Now, I'm not sure how this could or should be fixed.

What's REALLY weird is that our implementation of lx_getrlimit() is NO 
DIFFERENT from illumos-joyent's.  The ONLY differences in 
$SRC/uts/common/brand/lx/ are lint fixes on the OmniOS side, none of which go 
near lx_getrlimit().

I wonder if it's a function of which libc/glibc you have?  A quick way to find 
out is to run the attached D script, and then run the ulimit command to see how 
(and if) we get here and what happens.  Here's what I get, which yields MAX_INT:

bloody(~)[1]% bg
[1]sudo /tmp/lx-rlimit-proc.d &
bloody(~)[0]% sudo zlogin lx1 ulimit -u
CPU FUNCTION 
  5  -> lx_getrlimit  
2147483647
  libc.so.6`0x7e2fc0a7

  lx_brand`lx_syscall_enter+0x16f
  unix`sys_syscall+0x145

  5   | lx_getrlimit:entry
  5-> lx_getrlimit_common 
  5<- lx_getrlimit_common Returns 0x0
  5-> get_udatamodel  
  5<- get_udatamodel  Returns 0x20
  5-> copyout 
  5<- kcopy   Returns 0x0
  5  <- lx_getrlimit  Returns 0x0
bloody(~)[0]%   6  -> lx_getrlimit  
  0x7e6fc0a7

  lx_brand`lx_syscall_enter+0x16f
  unix`sys_syscall+0x145

  6   | lx_getrlimit:entry
  6-> lx_getrlimit_common 
  6<- lx_getrlimit_common Returns 0x0
  6-> get_udatamodel  
  6<- get_udatamodel  Returns 0x20
  6-> copyout 
  6<- kcopy   Returns 0x0
  6  <- lx_getrlimit  Returns 0x0
  6  -> lx_getrlimit  
  0x7e6fc0a7

  lx_brand`lx_syscall_enter+0x16f
  unix`sys_syscall+0x145

  6   | lx_getrlimit:entry
  6-> lx_getrlimit_common 
  6<- lx_getrlimit_common Returns 0x0
  6-> get_udatamodel  
  6<- get_udatamodel  Returns 0x20
  6-> copyout 
  6<- kcopy   Returns 0x0
  6  <- lx_getrlimit  Returns 0x0


I'd be interested in knowing what happens on the SmartOS box.  I also wonder if 
SmartOS launches the zone processes with lower limits already in place or not?

Dan



___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] LX: real ksh93 broken

2017-05-11 Thread Eric Sproul
On Thu, May 11, 2017 at 5:11 AM, Ludovic Orban  wrote:
> I assume smartos uses a different default, but I wish I could
> double-check that.

I have a not-too-old SmartOS box at my disposal:

Global zone:

# uname -a; ulimit -u
SunOS  5.11 joyent_20161110T013148Z i86pc i386 i86pc
4

In an Ubuntu 16.04 LX zone:

# uname -a; ulimit -u
Linux  4.4 BrandZ virtual linux x86_64 x86_64 x86_64 GNU/Linux
2000

Eric
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] LX: real ksh93 broken

2017-05-10 Thread Ludovic Orban
Okay, I found what causes ksh to misbehave. It's in sh_init(), when
shgd->lim.child_max is initialized with the results of
getconf("CHILD_MAX"), see:
https://github.com/att/ast/blob/master/src/cmd/ksh93/sh/init.c#L1289

I've commented out that line, hardcoded shgd->lim.child_max to 128, rebuilt
and voila: ksh works as it should.

Now I have to dig into that getconf() method to figure out what the
returned value is and where it's coming from. Sounds trivial, but my C is
*very* rusty, the asm gcc generates doesn't look at all what the JVM's JIT
generates (which gives me wrong reflexes as I'm used to the latter) and I'm
not very familiar with mdb.

Oh well, that turned into a nice debugging re-training session which I very
much needed. That reminds me the good old days at my first job when I was
porting Linux apps to Solaris.

Thank you for maintaining such a well-designed and pleasant to use OS!


On Wed, May 10, 2017 at 3:59 PM, Dan McDonald  wrote:

> Wow, thank you for the further deep-diving.
>
> > On May 10, 2017, at 5:21 AM, Ludovic Orban  wrote:
> >
> > Looking at ksh' sources, my understanding is that job_post is stuck in
> that else clause:
> >else
> >{
> >   /* create a new job */
> >   while((pw->p_job = job_alloc()) < 0)
> >  job_wait((pid_t)1);
> >   pw->p_nxtjob = job.pwlist;
> >   pw->p_nxtproc = 0;
> >}
> >
> > Digging into the sources and stepping though the instructions of
> job_alloc and job_byjid it looks like ksh cannot allocate a job id as it
> believes they're all reserved. But so far, all this code is purely working
> on internal structures of ksh so a LX bug would have no impact.
> >
> > I'll continue looking into this as time permits and I'll post an update
> if I find anything worth mentioning.
> >
>
> Be careful of narrowing your focus too far.  I see some things worth
> considering:
>
> 1.) If the "if" you're not showing me dependent on something in global
> state that may have been mis-initialized by an LX emulation bug?
>
> 2.) Same question as #1, but applied to job_alloc() and job_wait().
>
> I'm guessing LX in OmniOS is failing because I mismerged or plain forgot
> something, given that Nahum says he can run ksh93 on SmartOS just fine.
>
>
> Please make sure you're looking at the bigger picture, but THANK YOU for
> the further investigation.
>
> Dan
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] LX: real ksh93 broken

2017-05-10 Thread Ludovic Orban
In x86 asm, cmpl is both signed and unsigned, it's the following jump that
decides to work signed or not. In this case it's jl "jump if less" so it's
signed (vs jb "jump if before" that is unsigned). But I digress.

I've recompiled ksh93 with debug, no stripped symbols and no optimizations
(the binary is here: https://www.dropbox.com/s/brys628g40akruv/ksh93.gz?dl=0)
and managed to figure out where that infinite loop is happening:

> ::stack
job_byjid+5()
job_alloc+0x62()
job_post+0x1a2()
_sh_fork+0x265()
sh_ntfork+0xa99()
sh_exec+0x2be8()
sh_subshell+0x982()
comsubst+0xbf0()
varsub+0x3f4()
copyto+0xa2a()
sh_mactrim+0x196()
nv_setlist+0x220()
sh_exec+0xdb7()
sh_eval+0x2b9()
sh_trap+0x29b()
ed_setup+0x7ac()
ed_viread+0xf6()
slowread+0x181()
sfrd+0x4da()
_sffilbuf+0x433()
sfreserve+0x566()
exfile+0x808()
sh_main+0xb38()
main+0x25()
>

Looking at ksh' sources, my understanding is that job_post is stuck in that
else clause:
   else
   {
  /* create a new job */
  while((pw->p_job = job_alloc()) < 0)
 job_wait((pid_t)1);
  pw->p_nxtjob = job.pwlist;
  pw->p_nxtproc = 0;
   }

Digging into the sources and stepping though the instructions of job_alloc
and job_byjid it looks like ksh cannot allocate a job id as it believes
they're all reserved. But so far, all this code is purely working on
internal structures of ksh so a LX bug would have no impact.

I'll continue looking into this as time permits and I'll post an update if
I find anything worth mentioning.

--
Ludovic



On Tue, May 9, 2017 at 5:15 PM, Dan McDonald  wrote:

>
> > On May 9, 2017, at 11:05 AM, Dan McDonald  wrote:
> >
> > And I've no good way to know what it's doing, as the illumos-native
> tools aren't giving me enough data.
>
> ksh93 appears to be looping in something:
>
> mdb: target stopped at:
> 0x42adf0:   movq   +0x350129(%rip),%rax <0x77af20>
> > ::step
> mdb: target stopped at:
> 0x42adf7:   testq  %rax,%rax
> > ::step
> mdb: target stopped at:
> 0x42adfa:   jne+0xc <0x42ae08>
> > ::step
> mdb: target stopped at:
> 0x42adfc:   jmp+0x28<0x42ae26>
> > ::step
> mdb: target stopped at:
> 0x42ae26:   addl   $0x1,%r14d
> > ::step
> mdb: target stopped at:
> 0x42ae2a:   cmpl   0x10(%rsi),%r14d
> > ::step
> mdb: target stopped at:
> 0x42ae2e:   jl -0x40<0x42adf0>
> > ::step
> mdb: target stopped at:
> 0x42adf0:   movq   +0x350129(%rip),%rax <0x77af20>
> >  Usage: step [ over | out ] [SIG]
> >  0x7f0470f0
> > 0x7f0470f0/D
> 0x7f0470f0: 2147483647
> > 0x7f0470f0/X
> 0x7f0470f0: 7fff
> >  mdb: failed to read data from target: no mapping for address
> 0x761133e5:
> > 

Re: [OmniOS-discuss] LX: real ksh93 broken

2017-05-09 Thread Ludovic Orban
Don't worry too much about it, I have a workaround in place that suits me
well.

I just wanted to report this problem.

Thanks,
Ludovic

On Tue, May 9, 2017 at 5:05 PM, Dan McDonald  wrote:

>
> > On May 9, 2017, at 10:30 AM, Dan McDonald  wrote:
> >
> > When I get home I'll provide more details, but you should try bloody or
> still in beta r151022.
> >
>
> Well shoot.  It appears I have pretty much the same problem in OmniOS
> bloody (and therefore r151022):
>
> root@ubuntu-14-04-b:~# ksh93
> # echo "Hello, world."
> ^D
> ^C
> # ls /
> bin   dev  home  lib64  mnt opt   root  sbin  sys tmp  var
> boot  etc  lib   media  native  proc  run   srv   system  usr
>
> ^D
> ^C#
> #
>
> Any command I type that's builtin just stops.  Any command that forks
> outputs, but doesn't seem to properly exit.
>
> Ouch... and when I try it again from scratch, it hangs per the original
> report.  And worse, ONE TIME (can't reproduce) it seemed to runaway with
> spawning itself.  Apparently if the first thing you type is a builtin or
> just RETURN, you enter a state where you can amp-off processes, but the
> shell itself seems to wedge until you hit ^C.  If you enter a real process
> right off the bat, ksh93 goes on a slow forking spree... well, sometimes.
> Other times, hitting ^C in the hung shell will return you.
>
> I wonder if I mismerged or forgot to merge something from SmartOS?
>
> I also wonder if this mismerge or omission happened early on in the prior
> bloody cycle?  It's going to be hard to tell.
>
> Whatever ksh93 is doing, it appears to be doing it in LX userspace, too:
>
> bloody(~)[0]% ptree `pgrep ksh93`
> 493   /usr/sbin/sshd
>   9511  /usr/sbin/sshd -R
> 9515  /usr/sbin/sshd -R
>   9516  -tcsh
> 9555  sudo zlogin lx1
>   9556  zlogin lx1
> 9557  /bin/login -h zone:global -f root
>   9566  -bash
> 9622  ksh93
> bloody(~)[0]% sudo mdb -k
> Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix
> scsi_vhci zfs sata sd ip hook neti sockfs arp usba xhci stmf stmf_sbd mm
> lofs random crypto idm nfs cpc ufs logindmux ptm ipc ]
> > ::ps !grep ksh
> R   9622   9566   9622   9557  0 0x4a014000 ff1983ee5000 ksh93
> > ff1983ee5000::ps -t
> SPID   PPID   PGIDSIDUID  FLAGS ADDR NAME
> R   9622   9566   9622   9557  0 0x4a014000 ff1983ee5000 ksh93
> T  0xff1955982c60 
> > 0xff1955982c60::findstack
> stack pointer for thread ff1955982c60: ff007ae157f0
>   ff007ae15f10 0x20()
> >
>
> And I've no good way to know what it's doing, as the illumos-native tools
> aren't giving me enough data.
>
> Sorry,
> Dan
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] LX: real ksh93 broken

2017-05-09 Thread Dan McDonald

> On May 9, 2017, at 10:30 AM, Dan McDonald  wrote:
> 
> When I get home I'll provide more details, but you should try bloody or still 
> in beta r151022.
> 

Well shoot.  It appears I have pretty much the same problem in OmniOS bloody 
(and therefore r151022):

root@ubuntu-14-04-b:~# ksh93
# echo "Hello, world."
^D
^C
# ls /
bin   dev  home  lib64  mnt opt   root  sbin  sys tmp  var
boot  etc  lib   media  native  proc  run   srv   system  usr

^D
^C# 
# 

Any command I type that's builtin just stops.  Any command that forks outputs, 
but doesn't seem to properly exit.

Ouch... and when I try it again from scratch, it hangs per the original report. 
 And worse, ONE TIME (can't reproduce) it seemed to runaway with spawning 
itself.  Apparently if the first thing you type is a builtin or just RETURN, 
you enter a state where you can amp-off processes, but the shell itself seems 
to wedge until you hit ^C.  If you enter a real process right off the bat, 
ksh93 goes on a slow forking spree... well, sometimes.  Other times, hitting ^C 
in the hung shell will return you.

I wonder if I mismerged or forgot to merge something from SmartOS?

I also wonder if this mismerge or omission happened early on in the prior 
bloody cycle?  It's going to be hard to tell.

Whatever ksh93 is doing, it appears to be doing it in LX userspace, too:

bloody(~)[0]% ptree `pgrep ksh93`
493   /usr/sbin/sshd
  9511  /usr/sbin/sshd -R
9515  /usr/sbin/sshd -R
  9516  -tcsh
9555  sudo zlogin lx1
  9556  zlogin lx1
9557  /bin/login -h zone:global -f root
  9566  -bash
9622  ksh93
bloody(~)[0]% sudo mdb -k
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix 
scsi_vhci zfs sata sd ip hook neti sockfs arp usba xhci stmf stmf_sbd mm lofs 
random crypto idm nfs cpc ufs logindmux ptm ipc ]
> ::ps !grep ksh
R   9622   9566   9622   9557  0 0x4a014000 ff1983ee5000 ksh93
> ff1983ee5000::ps -t
SPID   PPID   PGIDSIDUID  FLAGS ADDR NAME
R   9622   9566   9622   9557  0 0x4a014000 ff1983ee5000 ksh93
T  0xff1955982c60 
> 0xff1955982c60::findstack
stack pointer for thread ff1955982c60: ff007ae157f0
  ff007ae15f10 0x20()
> 

And I've no good way to know what it's doing, as the illumos-native tools 
aren't giving me enough data.

Sorry,
Dan

___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] LX: real ksh93 broken

2017-05-09 Thread Dan McDonald
When I get home I'll provide more details, but you should try bloody or still 
in beta r151022.

Dan

Sent from my iPhone (typos, autocorrect, and all)

> On May 9, 2017, at 10:02 AM, Nahum Shalman  wrote:
> 
> As a data point, I tested this on a very recent SmartOS and was unable to 
> reproduce:
> 
> root@ksh-debian:~# set -o xtrace ; /native/usr/bin/uname -a; uname -a ; cat 
> /etc/debian_version ; dpkg-query -l ksh | grep ksh ; /bin/ksh93
> + set -o xtrace
> + /native/usr/bin/uname -a
> SunOS ksh-debian 5.11 joyent_20170427T222500Z i86pc i386 i86pc
> + uname -a
> Linux ksh-debian 3.16.10 BrandZ virtual linux x86_64 GNU/Linux
> + cat /etc/debian_version
> 8.8
> + dpkg-query -l ksh
> + grep ksh
> ii  ksh93u+20120801-1 amd64Real, AT version of the Korn 
> shell
> + /bin/ksh93
> # ps | grep $$
> 77074 pts/200:00:00 ksh93
> # exit
> root@ksh-debian:~#
> 
>> On Tue, May 9, 2017 at 5:52 AM, Ludovic Orban  wrote:
>> Hi,
>> 
>> I've installed the real ksh93 (this stuff: 
>> https://packages.debian.org/jessie/ksh) on a r151020 LX zone running the 
>> latest Debian 
>> (https://images.joyent.com/images/e74a9cd0-f2d0-11e6-8b69-b3acf2ef87f7) and 
>> it behaves strangely.
>> 
>> Here is what I get:
>> 
>> root@debian-8:~# /bin/ksh93
>> # ls /
>> 
>> 
>> ^C^C
>> ^Z^Z^Z
>> ^\^\^\^\
>> Killed
>> root@debian-8:~#
>> root@debian-8:~#
>> 
>> (note: the "Killed" line is due to a kill -9 from another terminal).
>> 
>> Basically, any command I type just hangs there. Sometimes the shell reacts 
>> to ^C and/or ^D allowing me to try a different command that also gonna hang, 
>> and sometimes it doesn't and I have to kill the shell from another terminal. 
>> In the latter case, ksh93 starts kind of a "fork bomb" where it seems to 
>> fork itself in a loop.
>> 
>> I can reproduce this problem from a zlogin term as well as from a direct ssh 
>> session. I've also tried ksh93 on a CentOS zone to make sure the problem 
>> wasn't caused by Debian's build and the exact same problem arises.
>> 
>> Any idea what's going on?
>> 
>> Thanks,
>> Ludovic
>> 
>> ___
>> OmniOS-discuss mailing list
>> OmniOS-discuss@lists.omniti.com
>> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>> 
> 
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss


Re: [OmniOS-discuss] LX: real ksh93 broken

2017-05-09 Thread Nahum Shalman
As a data point, I tested this on a very recent SmartOS and was unable to
reproduce:

root@ksh-debian:~# set -o xtrace ; /native/usr/bin/uname -a; uname -a ; cat
/etc/debian_version ; dpkg-query -l ksh | grep ksh ; /bin/ksh93
+ set -o xtrace
+ /native/usr/bin/uname -a
SunOS ksh-debian 5.11 joyent_20170427T222500Z i86pc i386 i86pc
+ uname -a
Linux ksh-debian 3.16.10 BrandZ virtual linux x86_64 GNU/Linux
+ cat /etc/debian_version
8.8
+ dpkg-query -l ksh
+ grep ksh
ii  ksh93u+20120801-1 amd64Real, AT version of the
Korn shell
+ /bin/ksh93
# ps | grep $$
77074 pts/200:00:00 ksh93
# exit
root@ksh-debian:~#

On Tue, May 9, 2017 at 5:52 AM, Ludovic Orban  wrote:

> Hi,
>
> I've installed the real ksh93 (this stuff: https://packages.debian.org/
> jessie/ksh) on a r151020 LX zone running the latest Debian (
> https://images.joyent.com/images/e74a9cd0-f2d0-11e6-8b69-b3acf2ef87f7)
> and it behaves strangely.
>
> Here is what I get:
>
> root@debian-8:~# /bin/ksh93
> # ls /
>
>
> ^C^C
> ^Z^Z^Z
> ^\^\^\^\
> Killed
> root@debian-8:~#
> root@debian-8:~#
>
> (note: the "Killed" line is due to a kill -9 from another terminal).
>
> Basically, any command I type just hangs there. Sometimes the shell reacts
> to ^C and/or ^D allowing me to try a different command that also gonna
> hang, and sometimes it doesn't and I have to kill the shell from another
> terminal. In the latter case, ksh93 starts kind of a "fork bomb" where it
> seems to fork itself in a loop.
>
> I can reproduce this problem from a zlogin term as well as from a direct
> ssh session. I've also tried ksh93 on a CentOS zone to make sure the
> problem wasn't caused by Debian's build and the exact same problem arises.
>
> Any idea what's going on?
>
> Thanks,
> Ludovic
>
> ___
> OmniOS-discuss mailing list
> OmniOS-discuss@lists.omniti.com
> http://lists.omniti.com/mailman/listinfo/omnios-discuss
>
>
___
OmniOS-discuss mailing list
OmniOS-discuss@lists.omniti.com
http://lists.omniti.com/mailman/listinfo/omnios-discuss