Hi -

I noticed that port forwarding wasn't working, and I figured I'd
chronicle my debugging efforts for you all, in the hope that it'll
help you all in your debugging.

First off, the setup.  I've got qemu running, with port 5555 on the
host mapped to 22 on the guest (as in GETTING_STARTED).  My ssh .config
has this:

Host qemu
        Hostname 127.0.0.1
        User root
        Port 5555
        IdentitiesOnly yes
        IdentityFile ~/.ssh/db_rsa

On Akaros, I'll run "listener", which is a simple echo server running
on port 23.  You can try it out easily with qemu by forwarding 5556 to
23, then telnetting to localhost 5556.  That works fine.

First, I set up the port-forward:

$ ssh -L 5556:127.0.0.1:23 qemu
-sh: can't access tty; job control turned off
/ $ 

Seems fine.


On Akaros, the listener is running.

bash-4.3$ listener
Using Plan 9's networking stack
Announced on line /net/tcp/2



On my host, I try to connect:

$ telnet localhost 5556
Trying ::1...
Connected to localhost.
Escape character is '^]'.
Connection closed by foreign host.


On the port-forward ssh, this error appears:

 channel 3: open failed: connect failed: 


On Akaros, listener printed this and exited:

Listened and got line /net/tcp/4
Server read:
(it read nothing, next up was the bash prompt)


So what next?  Let's look at the syscalls that occur during the
attempted connection.  strace is a bit of a pain in the ass to work
with sometimes, so I tend to just directly turn on the kernel's raw
tracing:

(ctrl-g)
ROS(Core 0)> trace syscall start
ROS(Core 0)> exit

Here's a slightly edited trace.  I'll point out the relevant things
below.  proc 273 is the dropbear (ssh) connecting process.  335 is
listener.


( poke_ksched):(0x0, 0x0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 core: 0 vcore: 
0 data: ''
( poke_ksched):(0x0, 0x0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 core: 0 vcore: 
0 data: ''
(  abort_sysc):(0x7f7fff9fe760, 0x0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(  abort_sysc):(0x7f7fff9fe760, 0x0, 0x0, 0x0, 0x0, 0x0) ret: 0x1 proc: 273 
core: 0 vcore: 0 data: ''
( self_notify):(0x0, 0x0, 0x0, 0x1, 0x0, 0x0) ret: --- proc: 273 core: 0 vcore: 
0 data: ''
( self_notify):(0x0, 0x0, 0x0, 0x1, 0x0, 0x0) ret: 0x0 proc: 273 core: 0 vcore: 
0 data: ''
(        read):(0x4, 0x7f7fff9fe98f, 0x1, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(        read):(0x4, 0x7f7fff9fe98f, 0x1, 0x0, 0x0, 0x0) ret: 
0xffffffffffffffff proc: 273 core: 0 vcore: 0 data: ''
(     gettime):(0x7f7fff9fe910, 0x0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(     gettime):(0x7f7fff9fe910, 0x0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 
core: 0 vcore: 0 data: ''
(        read):(0x7, 0x735678, 0x10, 0x0, 0x0, 0x0) ret: --- proc: 273 core: 0 
vcore: 0 data: ''
(        read):(0x7, 0x735678, 0x10, 0x0, 0x0, 0x0) ret: 0x10 proc: 273 core: 0 
vcore: 0 data: '\355\360J\016q\274\247\305X\364C\370od\201\313'
(        read):(0x7, 0x735688, 0x60, 0x0, 0x0, 0x0) ret: --- proc: 273 core: 0 
vcore: 0 data: ''
(        read):(0x7, 0x735688, 0x60, 0x0, 0x0, 0x0) ret: 0x60 proc: 273 core: 0 
vcore: 0 data: 
'\222\362\223\023\233\274%\363\033\375\203\233\313\202K\234\217p\036\320\351\241ZnW\376\240\211Fr\237\3000;\014\356>\253v\263\002\222\215\205\375
 
\325S.\210\333\204\011\367\217\354\277\237\244\255<\025\004\3777\256>\3004"\354\301\212\366'\222\025\027\203\273\233~s\376.\000\266\320\255K@\034*\236T\035'
(     gettime):(0x7f7fff9fe910, 0x0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(     gettime):(0x7f7fff9fe910, 0x0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 
core: 0 vcore: 0 data: ''
(        read):(0xd, 0x670421, 0x3ff7, 0x0, 0x0, 0x0) ret: --- proc: 273 core: 
0 vcore: 0 data: ''
(        read):(0xd, 0x670421, 0x3ff7, 0x0, 0x0, 0x0) ret: 0xffffffffffffffff 
proc: 273 core: 0 vcore: 0 data: ''
(     gettime):(0x7f7fff9fe910, 0x0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(     gettime):(0x7f7fff9fe910, 0x0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 
core: 0 vcore: 0 data: ''
(      openat):(0xffffffffffffff9c, 0x40000034cff6, 0xe, 0x3, 0x0, 0x0) ret: 
--- proc: 273 core: 0 vcore: 0 data: '/net/tcp/clone'
(      openat):(0xffffffffffffff9c, 0x40000034cff6, 0xe, 0x3, 0x0, 0x0) ret: 
0xb proc: 273 core: 0 vcore: 0 data: '/net/tcp/clone'
(        read):(0xb, 0x7f7fff9fe790, 0x7f, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(        read):(0xb, 0x7f7fff9fe790, 0x7f, 0x0, 0x0, 0x0) ret: 0x2 proc: 273 
core: 0 vcore: 0 data: '3\000'
(      openat):(0xffffffffffffff9c, 0x7f7fff9fe790, 0xf, 0x3, 0x0, 0x0) ret: 
--- proc: 273 core: 0 vcore: 0 data: '/net/tcp/3/data'
(      openat):(0xffffffffffffff9c, 0x7f7fff9fe790, 0xf, 0x3, 0x0, 0x0) ret: 
0xf proc: 273 core: 0 vcore: 0 data: '/net/tcp/3/data'
(       fstat):(0xb, 0x7f7fff9fe610, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0xb, 0x7f7fff9fe610, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0xb, 0x7f7fff9fe6a0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0xb, 0x7f7fff9fe6a0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 
core: 0 vcore: 0 data: ''
(       close):(0xb, 0x0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 core: 0 vcore: 
0 data: ''
(       close):(0xb, 0x0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 core: 0 vcore: 
0 data: ''
(       fstat):(0xf, 0x7f7fff9fe6d0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0xf, 0x7f7fff9fe6d0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0xf, 0x7f7fff9fe810, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0xf, 0x7f7fff9fe810, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 
core: 0 vcore: 0 data: ''
(       fcntl):(0xf, 0x4, 0x800, 0x0, 0x0, 0x0) ret: --- proc: 273 core: 0 
vcore: 0 data: ''
(       fcntl):(0xf, 0x4, 0x800, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 core: 0 
vcore: 0 data: ''
(       fstat):(0xf, 0x7f7fff9fe5f0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0xf, 0x7f7fff9fe5f0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 
core: 0 vcore: 0 data: ''
(      openat):(0xffffffffffffff9c, 0x677b00, 0xe, 0x3, 0x0, 0x0) ret: --- 
proc: 273 core: 0 vcore: 0 data: '/net/tcp/3/ctl'
(      openat):(0xffffffffffffff9c, 0x677b00, 0xe, 0x3, 0x0, 0x0) ret: 0xb 
proc: 273 core: 0 vcore: 0 data: '/net/tcp/3/ctl'
(       write):(0xb, 0x7f7fff9fe6a0, 0x14, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: 'connect 127.0.0.1!23'
(       write):(0xb, 0x7f7fff9fe6a0, 0x14, 0x0, 0x0, 0x0) ret: 0x14 proc: 273 
core: 0 vcore: 0 data: 'connect 127.0.0.1!23'
(        read):(0x4, 0x7f7fff9fec09, 0xf6, 0x0, 0x0, 0x0) ret: --- proc: 335 
core: 0 vcore: 0 data: ''
(        read):(0x4, 0x7f7fff9fec09, 0xf6, 0x0, 0x0, 0x0) ret: 0x2 proc: 335 
core: 0 vcore: 0 data: '4\000'
(       write):(0x1, 0x4000005df000, 0x21, 0x0, 0x0, 0x0) ret: --- proc: 335 
core: 0 vcore: 0 data: 'Listened and got line /net/tcp/4\012'
(       write):(0x1, 0x4000005df000, 0x21, 0x0, 0x0, 0x0) ret: 0x21 proc: 335 
core: 0 vcore: 0 data: 'Listened and got line /net/tcp/4\012'
(       fstat):(0xb, 0x7f7fff9fe530, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0xb, 0x7f7fff9fe530, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0xb, 0x7f7fff9fe5c0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0xb, 0x7f7fff9fe5c0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 
core: 0 vcore: 0 data: ''
(       close):(0xb, 0x0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 core: 0 vcore: 
0 data: ''
(       close):(0xb, 0x0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 core: 0 vcore: 
0 data: ''
(       write):(0x4, 0x7f7fff9fec10, 0x8, 0x0, 0x0, 0x0) ret: --- proc: 335 
core: 0 vcore: 0 data: 'accept 4'
(       write):(0x4, 0x7f7fff9fec10, 0x8, 0x0, 0x0, 0x0) ret: 
0xffffffffffffffff proc: 335 core: 0 vcore: 0 data: 'accept 4'
(       fstat):(0x4, 0x7f7fff9fe8b0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0x4, 0x7f7fff9fe8b0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0x7, 0x7f7fff9fe8b0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0x7, 0x7f7fff9fe8b0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 
core: 0 vcore: 0 data: ''
(      openat):(0xffffffffffffff9c, 0x7f7fff9fec10, 0xf, 0x3, 0x0, 0x0) ret: 
--- proc: 335 core: 0 vcore: 0 data: '/net/tcp/4/data'
(      openat):(0xffffffffffffff9c, 0x7f7fff9fec10, 0xf, 0x3, 0x0, 0x0) ret: 
0x5 proc: 335 core: 0 vcore: 0 data: '/net/tcp/4/data'
(       fstat):(0xd, 0x7f7fff9fe8b0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0xd, 0x7f7fff9fe8b0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0xf, 0x7f7fff9fe8b0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0xf, 0x7f7fff9fe8b0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 
core: 0 vcore: 0 data: ''
(        read):(0x5, 0x7f7fff9feda0, 0x100, 0x0, 0x0, 0x0) ret: --- proc: 335 
core: 0 vcore: 0 data: ''
(        read):(0x4, 0x7f7fff9fe98f, 0x1, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(        read):(0x4, 0x7f7fff9fe98f, 0x1, 0x0, 0x0, 0x0) ret: 
0xffffffffffffffff proc: 273 core: 0 vcore: 0 data: ''
(     gettime):(0x7f7fff9fe910, 0x0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(     gettime):(0x7f7fff9fe910, 0x0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 
core: 0 vcore: 0 data: ''
(        read):(0x7, 0x735678, 0x10, 0x0, 0x0, 0x0) ret: --- proc: 273 core: 0 
vcore: 0 data: ''
(        read):(0x7, 0x735678, 0x10, 0x0, 0x0, 0x0) ret: 0xffffffffffffffff 
proc: 273 core: 0 vcore: 0 data: ''
(       fstat):(0xf, 0x7f7fff9fe860, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0xf, 0x7f7fff9fe860, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0xf, 0x7f7fff9fe7a0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0xf, 0x7f7fff9fe7a0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0xf, 0x7f7fff9fe830, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0xf, 0x7f7fff9fe830, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 
core: 0 vcore: 0 data: ''
(       close):(0xf, 0x0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 core: 0 vcore: 
0 data: ''
(       close):(0xf, 0x0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 core: 0 vcore: 
0 data: ''
(        read):(0x5, 0x7f7fff9feda0, 0x100, 0x0, 0x0, 0x0) ret: 0x0 proc: 335 
core: 0 vcore: 0 data: ''
(       close):(0x5, 0x0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 335 core: 0 vcore: 
0 data: ''
(       close):(0x5, 0x0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 335 core: 0 vcore: 
0 data: ''
(        read):(0xd, 0x670421, 0x3ff7, 0x0, 0x0, 0x0) ret: --- proc: 273 core: 
0 vcore: 0 data: ''
(        read):(0xd, 0x670421, 0x3ff7, 0x0, 0x0, 0x0) ret: 0xffffffffffffffff 
proc: 273 core: 0 vcore: 0 data: ''
(     gettime):(0x7f7fff9fe910, 0x0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(     gettime):(0x7f7fff9fe910, 0x0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 
core: 0 vcore: 0 data: ''
(     gettime):(0x7f7fff9fe7f0, 0x0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(     gettime):(0x7f7fff9fe7f0, 0x0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 
core: 0 vcore: 0 data: ''
(       close):(0xfffffffffffffffe, 0x0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(       close):(0xfffffffffffffffe, 0x0, 0x0, 0x0, 0x0, 0x0) ret: 
0xffffffffffffffff proc: 273 core: 0 vcore: 0 data: ''
(       close):(0xfffffffffffffffe, 0x0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(       close):(0xfffffffffffffffe, 0x0, 0x0, 0x0, 0x0, 0x0) ret: 
0xffffffffffffffff proc: 273 core: 0 vcore: 0 data: ''
(       fstat):(0x4, 0x7f7fff9fe8b0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0x4, 0x7f7fff9fe8b0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0x7, 0x7f7fff9fe8b0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0x7, 0x7f7fff9fe8b0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 
core: 0 vcore: 0 data: ''
(        read):(0x4, 0x7f7fff9fe98f, 0x1, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(        read):(0x4, 0x7f7fff9fe98f, 0x1, 0x0, 0x0, 0x0) ret: 
0xffffffffffffffff proc: 273 core: 0 vcore: 0 data: ''
(     gettime):(0x7f7fff9fe910, 0x0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(     gettime):(0x7f7fff9fe910, 0x0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 
core: 0 vcore: 0 data: ''
(        read):(0x7, 0x735678, 0x10, 0x0, 0x0, 0x0) ret: --- proc: 273 core: 0 
vcore: 0 data: ''
(        read):(0x7, 0x735678, 0x10, 0x0, 0x0, 0x0) ret: 0xffffffffffffffff 
proc: 273 core: 0 vcore: 0 data: ''
(        read):(0xd, 0x670421, 0x3ff7, 0x0, 0x0, 0x0) ret: --- proc: 273 core: 
0 vcore: 0 data: ''
(        read):(0xd, 0x670421, 0x3ff7, 0x0, 0x0, 0x0) ret: 0xffffffffffffffff 
proc: 273 core: 0 vcore: 0 data: ''
(       write):(0x7, 0x735a68, 0x40, 0x0, 0x0, 0x0) ret: --- proc: 273 core: 0 
vcore: 0 data: 
'\365\231\377\357\324\233\014e\276\035\236\023)\037\306Mh\0059\0318'\300\3275\014\035/\301\372\026\244\230\217s\234\037\244v<\366\\314~R\275\212K\365Z\337\035~\263\351\360\3356\306\207\225\200\340Y'
(       write):(0x7, 0x735a68, 0x40, 0x0, 0x0, 0x0) ret: 0x40 proc: 273 core: 0 
vcore: 0 data: 
'\365\231\377\357\324\233\014e\276\035\236\023)\037\306Mh\0059\0318'\300\3275\014\035/\301\372\026\244\230\217s\234\037\244v<\366\\314~R\275\212K\365Z\337\035~\263\351\360\3356\306\207\225\200\340Y'
(     gettime):(0x7f7fff9fe910, 0x0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(     gettime):(0x7f7fff9fe910, 0x0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0x4, 0x7f7fff9fe8b0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0x4, 0x7f7fff9fe8b0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0x7, 0x7f7fff9fe8b0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0x7, 0x7f7fff9fe8b0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0xd, 0x7f7fff9fe8b0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(       fstat):(0xd, 0x7f7fff9fe8b0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 
core: 0 vcore: 0 data: ''
(       block):(0x565138c0, 0x40000057af00, 0x735830, 0x810, 0x40000000000, 
0x735fd0) ret: --- proc: 273 core: 0 vcore: 0 data: ''
(       close):(0x4, 0x0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 335 core: 0 vcore: 
0 data: ''
(       close):(0x4, 0x0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 335 core: 0 vcore: 
0 data: ''
(       close):(0x3, 0x0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 335 core: 0 vcore: 
0 data: ''
(       close):(0x3, 0x0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 335 core: 0 vcore: 
0 data: ''
(  proc_yield):(0x0, 0x0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 core: 0 vcore: 
0 data: ''
(  proc_yield):(0x0, 0x0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 273 core: 0 vcore: 
0 data: ''
(       write):(0x1, 0x4000005df000, 0xd, 0x0, 0x0, 0x0) ret: --- proc: 335 
core: 0 vcore: 0 data: 'Server read: '
 Syscall 101 (       write):(0x1, 0x4000005df000, 0xd, 0x0, 0x0, 0x0) ret: 0xd 
proc: 335 core: 0 vcore: 0 data: 'Server read: '
(proc_destroy):(0x14f, 0x0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 335 core: 0 
vcore: 0 data: ''
(proc_destroy):(0x14f, 0x0, 0x0, 0x0, 0x0, 0x0) ret: 0x0 proc: 335 core: 0 
vcore: 0 data: ''


I focus on the opens, reads, and writes.  The last write is the one
where the listener gave up.  If we look at the openats, we can see DB
opened a new connection (tcp/clone), figured out what line it was (the
reads on 0xb), then tried to connect to listener (connect
127.0.0.1!23).  All seems well.  We can see the listener gets the
connection, on line tcp/4.

Next up, notice that the listener was reading from FD 5, got 0 back
(conversation is closed).  That was its inbound line.  (look for the
openat of /net/tcp/4/data).  It's other end, which is handled by DB,
must have closed.  What did DB do during this time?  Here's the
relevant (edited) part:

(        read):(0x5, 0x7f7fff9feda0, 0x100, 0x0, 0x0, 0x0) ret: --- proc: 335 
core: 0 vcore: 0 data: ''
        This was the entry (E) record for 335 reading.  It doesn't
        return right away (it's more clear in the untruncated trace).

(        read):(0x4, 0x7f7fff9fe98f, 0x1, 0x0, 0x0, 0x0) ret: --- proc: 273 
core: 0 vcore: 0 data: ''
(        read):(0x4, 0x7f7fff9fe98f, 0x1, 0x0, 0x0, 0x0) ret: 
0xffffffffffffffff proc: 273 core: 0 vcore: 0 data: ''
        Read on 4.  This was a pipe FD, which was probably the
        pipe with the ssh-forward shell.  (recall that forwarding
        spawns a shell, visible above).

(        read):(0x7, 0x735678, 0x10, 0x0, 0x0, 0x0) ret: --- proc: 273 core: 0 
vcore: 0 data: ''
(        read):(0x7, 0x735678, 0x10, 0x0, 0x0, 0x0) ret: 0xffffffffffffffff 
proc: 273 core: 0 vcore: 0 data: ''
 0 data: ''
        Read on 7.  That FD was for tcp/2/data.  That's the network
        connection with the forwarding agent.

(       close):(0xf, 0x0, 0x0, 0x0, 0x0, 0x0) ret: --- proc: 273 core: 0 vcore: 
0 data: ''
        Close 0xf.  That was for tcp/3/data.  that was the conversation
        that connects to listener.

(        read):(0x5, 0x7f7fff9feda0, 0x100, 0x0, 0x0, 0x0) ret: 0x0 proc: 335 
core: 0 vcore: 0 data: ''
        The listener's read returns.  The other end was closed, so it
        returned.

How did I connect FDs to files?  Use the 'pip' command from the
monitor.  Here's dropbear's (pid 273) file table:

Open Files:
        FD: 00, File: 0xffff80013d570260, File name: null
        FD: 01, File: 0xffff80013d570260, File name: null
        FD: 02, File: 0xffff80013d570260, File name: null
        FD: 03, Chan flags: 0x0000000000008001, pathname: #pipe., ref: 1, Dev: 
pipe, Devinfo: qid.path: 0x0000000000000220, qid.type: 80
        FD: 04, Chan flags: 0x0000000000000801, pathname: #pipe./data, ref: 2, 
Dev: pipe, Devinfo: qid.path: 0x0000000000000221, qid.type: 01
        FD: 05, Chan flags: 0x0000000000008001, pathname: #pipe., ref: 2, Dev: 
pipe, Devinfo: qid.path: 0x00000000000000e0, qid.type: 80
        FD: 06, Chan flags: 0x0000000000008001, pathname: #pipe., ref: 2, Dev: 
pipe, Devinfo: qid.path: 0x00000000000001e0, qid.type: 80
        FD: 07, Chan flags: 0x0000000000000801, pathname: /net/tcp/2/data, ref: 
2, Dev: ip, Devinfo: Qdata, tapped, proto tcp, conv idx 2, rq len 0, wq len 64
        FD: 08, Chan flags: 0x0000000000000801, pathname: #pipe./data1, ref: 1, 
Dev: pipe, Devinfo: qid.path: 0x0000000000000222, qid.type: 01
        FD: 09, Chan flags: 0x0000000000008001, pathname: #pipe., ref: 4, Dev: 
pipe, Devinfo: qid.path: 0x0000000000000260, qid.type: 80
        FD: 10, Chan flags: 0x0000000000000001, pathname: #pipe./data, ref: 2, 
Dev: pipe, Devinfo: qid.path: 0x0000000000000261, qid.type: 01
        FD: 12, Chan flags: 0x0000000000008001, pathname: #pipe., ref: 4, Dev: 
pipe, Devinfo: qid.path: 0x00000000000002a0, qid.type: 80
        FD: 13, Chan flags: 0x0000000000000801, pathname: #pipe./data, ref: 2, 
Dev: pipe, Devinfo: qid.path: 0x00000000000002a1, qid.type: 01
        FD: 14, Chan flags: 0x0000000000000001, pathname: #pipe./data1, ref: 2, 
Dev: pipe, Devinfo: qid.path: 0x00000000000002a2, qid.type: 01

Notice that FD 0xf is closed.  I figured out it was tcp/3/data by
looking at the openat syscalls in the trace.

Ok.  So now we need to figure out what should have been going on.  It
looks like DB was trying to read from tcp/2 (FD 7), then failed for some
reason, and because of that, it killed the forwarding.

Let's recall that DB uses select, and our select is sometimes
spurious.  Because of that, you might select, be told something is
there, and then there isn't anything there.  Akaros returns EAGAIN in
these cases.  My hunch at this point is that DB wanted to read, got an
error for EAGAIN, but didn't handle it nicely and shut down the
connection.  The ssh forward would have returned something, but it
didn't have time.  This is just a hunch.  Let's try to get a little
more info.

To save me some time, I'd like to see more about those read errors.  I
can do this quickly with a kernel hack.  In sys_read, I added this to
the end:

diff --git a/kern/src/syscall.c b/kern/src/syscall.c
index 9d82597e26a6..4a7695b40ed9 100644
--- a/kern/src/syscall.c
+++ b/kern/src/syscall.c
@@ -1559,6 +1559,11 @@ static intreg_t sys_read(struct proc *p, int fd, void 
*buf, size_t len)
                /* plan9, should also handle errors (EBADF) */
                ret = sysread(fd, buf, len);
        }
+       if (printx_on && ret == -1) {
+               printk("ret failed, errno %d, errstr %s\n", get_errno(),
+                      current_errstr());
+               backtrace_user_ctx(current, current_ctx);
+       }
        return ret;
 }
 
Then I turned on printx (px from the monitor) as well as syscall
tracing.  Note that backtrace_user_ctx only works on non-blocked
syscalls.  Otherwise you'll get garbage, and, as-written, possibly
kernel faults.  It's a hack.

After a rebuild and restart, the FDs and such seem to have moved
around.  You got to scan through the trace again to make sense of what
is going on.  It looks like this one is the call I'm concerned about:

E [    381.510943573]-[      0.000000000] Syscall 100 (        read):(0x6, 
0x675428, 0x10, 0x0, 0x0, 0x0) ret: --- proc: 264 core: 0 vcore: 0 data: ''
ret failed, errno 11, errstr queue empty

Backtrace of user context on Core 0:
        Offsets only matter for shared libraries
#01 Addr 0x000040000030992e is in libc-2.19.so at offset 0x00000000000d092e
#02 Addr 0x00004000003010f6 is in libc-2.19.so at offset 0x00000000000c80f6
#03 Addr 0x000000000040d3b9 is in dropbear at offset 0x000000000000d3b9
#04 Addr 0x000000000040d200 is in dropbear at offset 0x000000000000d200
#05 Addr 0x000000000040c225 is in dropbear at offset 0x000000000000c225
#06 Addr 0x000000000041b0b2 is in dropbear at offset 0x000000000001b0b2
#07 Addr 0x000000000041eedf is in dropbear at offset 0x000000000001eedf
#08 Addr 0x000000000041e6b5 is in dropbear at offset 0x000000000001e6b5
#09 Addr 0x0000400000257ec2 is in libc-2.19.so at offset 0x000000000001eec2
#10 Addr 0x0000000000403b25 is in dropbear at offset 0x0000000000003b25
X [    381.510943573]-[    381.580352579] Syscall 100 (        read):(0x6, 
0x675428, 0x10, 0x0, 0x0, 0x0) ret: 0xffffffffffffffff proc: 264 core: 0 vcore: 
0 data: ''

(FD 6 is the only tcp connection for 264, which is DB).

Also, all of the "ret failed" syscalls (and there's a lot of them) are
for errno = 11, "queue empty".  That's EAGAIN, and the errstr is from
qio.c (git grep "queue empty" -> k/s/ns/qio.c).

Let's get a backtrace from that.  Echo the #01 Addr lines and pipe it
to scripts/bt-akaros.sh.  

If you haven't used that script before, you'll probably need to make
your own version and set the paths for your binaries and libraries.  I
point to my KFS.

Also, for whatever reason, I usually have to echo the lines once, then
pipe it to bt-akaros with a separate command (up-arrow, |
~/scripts/bt-akaros.sh).  Not sure why.

Anyways:

#01 Addr 0x000040000030992e is in libc-2.19.so at offset 0x00000000000d092e  
__ros_syscall_errno@@GLIBC_2.2.5+0x6e
#02 Addr 0x00004000003010f6 is in libc-2.19.so at offset 0x00000000000c80f6  
__read@@GLIBC_2.2.5+0x26
#03 Addr 0x000000000040d3b9 is in dropbear at offset 0x000000000000d3b9  
daemon+0x18b8
#04 Addr 0x000000000040d200 is in dropbear at offset 0x000000000000d200  
daemon+0x16ff
#05 Addr 0x000000000040c225 is in dropbear at offset 0x000000000000c225  
daemon+0x724
#06 Addr 0x000000000041b0b2 is in dropbear at offset 0x000000000001b0b2  
daemon+0xf5b1
#07 Addr 0x000000000041eedf is in dropbear at offset 0x000000000001eedf  
daemon+0x133de
#08 Addr 0x000000000041e6b5 is in dropbear at offset 0x000000000001e6b5  
daemon+0x12bb4
#09 Addr 0x0000400000257ec2 is in libc-2.19.so at offset 0x000000000001eec2  
__libc_start_main@@GLIBC_2.2.5+0xa2
#10 Addr 0x0000000000403b25 is in dropbear at offset 0x0000000000003b25  
LOOKUP_FAILED+0x403b25

I was hoping to save a little time and find where in dropbear we were
making that read, instead of hunting around at random.  daemon for
everything doesn't help much.

Oh, remember a week or so ago I was complaining about the size of
binaries?  I stripped my kfs/bin to decrease the size of things I
didn't care about.  Let's see:

$ file kern/kfs/bin/dropbear
kern/kfs/bin/dropbear: ELF 64-bit LSB  executable, x86-64, version 1 (SYSV), 
dynamically linked (uses shared libs), for GNU/<unknown> 1.0.0, stripped

Whoops.  Let's just copy the one from my DB build and try again.  (It's
still in dropbear-akaros/build/dropbear).  Here we go:

#01 Addr 0x000040000030992e is in libc-2.19.so at offset 0x00000000000d092e  
__ros_syscall_errno@@GLIBC_2.2.5+0x6e
#02 Addr 0x00004000003010f6 is in libc-2.19.so at offset 0x00000000000c80f6  
__read@@GLIBC_2.2.5+0x26
#03 Addr 0x000000000040d3b9 is in dropbear at offset 0x000000000000d3b9  
read_packet_init+0x96
#04 Addr 0x000000000040d200 is in dropbear at offset 0x000000000000d200  
read_packet+0x51
#05 Addr 0x000000000040c225 is in dropbear at offset 0x000000000000c225  
session_loop+0x396
#06 Addr 0x000000000041b0b2 is in dropbear at offset 0x000000000001b0b2  
svr_dropbear_exit+0x0
#07 Addr 0x000000000041eedf is in dropbear at offset 0x000000000001eedf  
main_noinetd+0x771
#08 Addr 0x000000000041e6b5 is in dropbear at offset 0x000000000001e6b5  
main+0x5e
#09 Addr 0x0000400000257ec2 is in libc-2.19.so at offset 0x000000000001eec2  
__libc_start_main@@GLIBC_2.2.5+0xa2
#10 Addr 0x0000000000403b25 is in dropbear at offset 0x0000000000003b25  
_start+0x29

So now I have a place to look in DB's code.

Thanks to tagging, it's pretty easy to find read_packet and
read_packet_init.  It's clear that read_packet_init returns
DROPBEAR_FAILURE on EAGAIN, just like EINTR.  When I worked on the DB
port, I tried to make EAGAIN act like EINTR, with the hope that any
restart code that handles EINTR would do the same for EAGAIN.
(technically, they should be able to handle EINTR *and* EAGAIN, I
think).

What's a little worrisome is that DB doesn't propagate the error out
from read_packet:

        if (ret == DROPBEAR_FAILURE) {
            /* didn't read enough to determine the length */
            TRACE2(("leave read_packet: packetinit done"))
            return; 
        }

Maybe that's fine, and the session_loop() handles it by retrying.

I'm a little curious where the closes are coming from.  Let's hack the
kernel again for a backtrace, same as we did with read.

Here's DB's FD for the data connection to listener:

X [     50.016927599]-[     50.029866209] Syscall 102 (      
openat):(0xffffffffffffff9c, 0x7f7fff9fe790, 0xf, 0x3, 0x0, 0x0) ret: 0xe proc: 
270 core: 0 vcore: 0 data: '/net/tcp/3/data'

So I'm looking for whoever closed 0xe.

E [     50.884438909]-[      0.000000000] Syscall 103 (       close):(0xe, 0x0, 
0x0, 0x0, 0x0, 0x0) ret: --- proc: 270 core: 0 vcore: 0 data: ''
sys_close 14

Backtrace of user context on Core 0:
        Offsets only matter for shared libraries
#01 Addr 0x000040000030992e is in libc-2.19.so at offset 0x00000000000d092e
#02 Addr 0x00004000003018e4 is in libc-2.19.so at offset 0x00000000000c88e4
#03 Addr 0x0000000000404b0e is in dropbear at offset 0x0000000000004b0e
#04 Addr 0x00000000004186b1 is in dropbear at offset 0x00000000000186b1
#05 Addr 0x000000000040c254 is in dropbear at offset 0x000000000000c254
#06 Addr 0x000000000041b0b2 is in dropbear at offset 0x000000000001b0b2
#07 Addr 0x000000000041eedf is in dropbear at offset 0x000000000001eedf
#08 Addr 0x000000000041e6b5 is in dropbear at offset 0x000000000001e6b5
#09 Addr 0x0000400000257ec2 is in libc-2.19.so at offset 0x000000000001eec2
#10 Addr 0x0000000000403b25 is in dropbear at offset 0x0000000000003b25

Which is:

#01 Addr 0x000040000030992e is in libc-2.19.so at offset 0x00000000000d092e  
__ros_syscall_errno@@GLIBC_2.2.5+0x6e
#02 Addr 0x00004000003018e4 is in libc-2.19.so at offset 0x00000000000c88e4  
close@@GLIBC_2.2.5+0x54
#03 Addr 0x0000000000404b0e is in dropbear at offset 0x0000000000004b0e  
m_close+0x1d
#04 Addr 0x00000000004186b1 is in dropbear at offset 0x00000000000186b1  
handle_connect_fds+0x11b
#05 Addr 0x000000000040c254 is in dropbear at offset 0x000000000000c254  
session_loop+0x3c5
#06 Addr 0x000000000041b0b2 is in dropbear at offset 0x000000000001b0b2  
svr_dropbear_exit+0x0
#07 Addr 0x000000000041eedf is in dropbear at offset 0x000000000001eedf  
main_noinetd+0x771
#08 Addr 0x000000000041e6b5 is in dropbear at offset 0x000000000001e6b5  
main+0x5e
#09 Addr 0x0000400000257ec2 is in libc-2.19.so at offset 0x000000000001eec2  
__libc_start_main@@GLIBC_2.2.5+0xa2
#10 Addr 0x0000000000403b25 is in dropbear at offset 0x0000000000003b25  
_start+0x29

That's just a little farther down in session_loop.

Let's try addr2line to get a better idea:

$ addr2line -e kern/kfs/bin/dropbear 0x00000000004186b1
/usr/local/google/home/brho/akaros/dropbear-akaros/build/../netio.c:218

(bt-akaros doesn't use addr2line for some old reason.  I think it had
trouble with some binaries.  Maybe that can be fixed up now.).

That looks like it's right after a getsockopt() call failed.
Incidentally, there's a TRACE() right after that in DB.  Maybe I should
have turned on DB tracing first, though that often has a lot of text to
parse.

Let's kill and restart DB:

/bin/dropbear -v 2>dbout &

then retry and cat the output:

bash-4.3$ cat dbout 
TRACE  (285) 0.000000: enter buf_get_rsa_priv_key
TRACE  (285) 0.001149: enter buf_get_rsa_pub_key
TRACE  (285) 0.001183: leave buf_get_rsa_pub_key: success
TRACE  (285) 0.001212: leave buf_get_rsa_priv_key
TRACE  (285) 0.001232: leave loadhostkey
[285] May 12 12:54:08 Failed loading /etc/dropbear/dropbear_dss_host_key
TRACE  (285) 0.001286: leave loadhostkey
[285] May 12 12:54:08 Failed loading /etc/dropbear/dropbear_ecdsa_host_key
TRACE  (285) 0.001325: leave loadhostkey
TRACE  (285) 0.001338: Disabling key type 1
TRACE  (285) 0.001352: Disabling key type 2
TRACE  (285) 0.001365: Disabling key type 3
TRACE  (285) 0.001379: Disabling key type 4
TRACE  (285) 0.001432: listensockets: 1 to try
TRACE  (285) 0.001447: listening on ':22'
TRACE  (285) 0.001462: enter dropbear_listen
TRACE  (285) 0.001475: dropbear_listen: all interfaces
TRACE  (285) 0.001562: leave dropbear_listen: success, 1 socks bound
TRACE  (285) 0.001585: Couldn't set IP_TOS (Protocol not available)

That wasn't a lot of tracing, which is nice to know.  But that's not the
same error at the end.  Also, 285 is the main PID for DB, not the one
for our forwarded session.  Maybe tracing didn't get propagated?  Upon
a closer examination (syscall tracing), the intermediate DB process
(controlling the forwarding) is printing to FD 2, but 

FD: 02, File: 0xffff80013dd73260, File name: null

It's going to devnull, instead of its parent's stderr.  Whoops.  scp.c
seems to have something that replaces all std FDs with devnull.  Let's
just do a quick printf / while(1) to see if we're doing that.  Nope.

Ah, we do it in akaros.c when we daemonize.  That code is a copy from
somewhere else, and it looks like a weird bug.  Note:

        int daemon(int nochdir, int noclose);

that takes a 'bool' for "don't close".  The callsite:

        if (daemon(0, closefds) < 0) 

Based on the naming, it seems like the opposite of "no close".

from daemon's man page (which we're trying to emulate):

       If noclose is zero, daemon() redirects standard input, standard
       output and standard error to /dev/null; otherwise, no changes
       are  made  to  these  file descriptors.

Got it?  I might not...  There's a lot of double-negatives there.  I'm
not sure what DB wants to do here.  Let's just not close it...

That gives us a lot of debug output, though it's not the getsockopt
error this time.  There might be a timing issue involved.  Good times.
Let's just print our one error, and not the whole trace.

I dropped a perror in there and changed the TRACE to an fprintf:
        GETSOCKOPT: Protocol not available, queue empty
        handle_connect_fds getsockopt(14) SO_ERROR failed: Protocol not 
available

Which makes sense.  Our getsockopt()
(glibc-2.19-akaros/sysdeps/akaros/getsockopt.c) doesn't support much,
it specifically doesn't support SO_ERROR.

Even if we did, it looks like DB is looking for an error.  If it gets
one, it'll bail out anyways (netio.c@221).  If it doesn't, it thinks the
connection has been established (netio.c@229).  That might not be true.

Let's try just succeeding in that loop and see how badly things break.

Well, how about that - it works fine.  Perhaps the connection is
established enough, even if we're at the EAGAIN phase, for DB to
continue.

The attempted fix here is to allow getsockopt() to handle SO_ERROR.  In
this case, we return 0 (no errors yet!), and clear the old error value
(which is none).  That at least is the right thing to do for glibc;
later we can support the appropriate errors in the socket shim
interface.

The only change to dropbear will be to fix its closefd value for
daemon().  Given the naming, I'm guessing they want the opposite of
what they currently have.

With the glibc change in place, we can now do port forwarding with ssh:


Linux:
---------------
$ ssh -L 5556:127.0.0.1:23 qemu
-sh: can't access tty; job control turned off
/ $ 



Akaros:
---------------
bash-4.3$ epoll_server 
Using the BSD socket shims over Plan 9's networking stack
Listened on port 23
Accepted and got dfd 6
Server read: hello

bash-4.3$


Linux:
---------------
 $ telnet localhost 5556
Trying ::1...
Connected to localhost.
Escape character is '^]'.
hello
hello
^]

telnet> Connection closed.


Hopefully this was helpful as an example of how to debug problems on
Akaros.

Barret

-- 
You received this message because you are subscribed to the Google Groups 
"Akaros" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to