Re: [ql-developers] Massive amount of job state transitions and re-scheduling

Peter Graf Fri, 05 Sep 2003 15:36:43 -0700

Thierry wrote:

On Thu, 04 Sep 2003 20:23:08 +0200, Peter Graf wrote:

> Hi,
>
> I made an experimental boost of QLwIP speed to the Ethernet maximum of 10
> Mbit/sec, which results in a massive amount of calls to MT.SUSJB, MT.RELJB
> and MT.PRIOR, typically several thousands per second.

[snip]

Well, I guess the problem is that all three calls are exiting via the
scheduler (they are not atomic traps). My guess is that calling them in
rapid succession (more than once every 1/50th of second) makes the job
to reenter recursively the scheduler and to fill up the supervisor stack...

Calls can indeed be more than 20 times per 1/50th of a second. I have no idea how the recursion could emerge, but your scenario would fit into the picture.

It might work under SMSQ/E (bigger stack, much better and faster scheduler),
but this is definitely not recommended under QDOS...

I'll have a look.

Plus, I'm a bit surprised that you are apparently using jobs to fetch the
data from the ethernet card... It should be done via an interrupt handler
instead...

At first sight it looks like that of course. QDOS/SMS reality is different though.

Actually, the best design would be to have the Q60 fast interrupt
handler to fill a buffer, and a frame interrupt task to move the data from
that buffer into a bigger one for your job to fetch it in big chunks...).

Wrong.

1. TCP is not a linear flow of data into one direction, even if the purpose is file transfer. QDOS (and likely SMSQ/E, too) is so primitive that an interrupt service routine can _not_ trigger immediate rescheduling of jobs after it has completed. The time until the next rescheduling can be 20 ms (worst case) so the user job has to wait that time until it can process the data. The effect is that the other TCP endpoint in the network has to wait 20 ms + processing + transfer time until it can react to the response packet. Given MTU=1460=1.5KB your interrupt driven approach can not guarantee more than a throughput of 1.5 KB / 20 ms = 75 KB/s with TCP, even if the other endpoint needs zero time to process it's packets. (75 KB/s is not quite what I want.)

Unlike an ISR, a job _can_ trigger immediate rescheduling! You don't need to always poll the NIC, a clever approach can lead to full TCP throughput during network activity, but zero polling waste (except for a a few tens of instructions per 50 Hz) when the network is inactive. The details are somewhat complex, but as long the OS isn't changed, I have no better choice.

2. You waste response (and processor) time by your second copying level. Imagine running the TCP/IP stack on a SuperGoldCard. Copying or not copying about 1 MB every second _does_ matter.

3. The idea of collecting fragments into larger buffers is not feasible, unless you implement the TCP/IP stack itself within ISRs. (There are good reasons not to do that!)

All the best
Peter

Re: [ql-developers] Massive amount of job state transitions and re-scheduling

Reply via email to