Re: [Stackless] Stackless based replacement

Larry Dickson Mon, 06 Oct 2008 07:52:57 -0700

Hi Arnar and all,

Attached is the promised set of notes. The sample set of pseudocode is in
the second file, after list member Donn Cave drew my attention to O'Haskell
and its relative, rhaskell, which also seemed to show an odd ignorance of
ALT/select. But chanschd.txt should completely cover the points for
Stackless, unless I'm missing something.

Larry

On 10/3/08, Larry Dickson <[EMAIL PROTECTED]> wrote:
>
> <SNIP>
> On 10/3/08, Arnar Birgisson <[EMAIL PROTECTED]> wrote:
>>
>> <SNIP>
>
>
> I'd be interested to see an implementation :)
>

I'll put some notes together over the weekend and send them along - I have
to pull it out of non-digital 1988 references.

Larry

cheers,
> Arnar
>

 Channel based scheduling - the transputer model
 Larry Dickson
 [EMAIL PROTECTED]
 4-5 October 2008


SUMMARY

The purpose of this note is to show how both hardware and software IO can drive 
scheduling of processes (by whatever name - "tasklets" is the term used in 
Stackless Python) that are run in a single memory and sequential CPU or 
analogue. No multitasking OS is needed (or desired) because the model handles 
all scheduling. Asynchronous stimulation by hardware IO readiness is OK.

The architecture described is that of the transputer, which is slim and can 
easily be simulated (see the Transterpreter project). Transputer instructions, 
whose names are usually self-explanatory, will occasionally be referred to in 
single quotes, e.g. 'startp'. The Reference gives details.

This note is derived from the Reference, which can still be found. 
Unfortunately the Reference does not seem to be available in machine-readable 
form.


PROCESSES

A process, in the sense meant in this note, comes into existence ('startp'), 
carries out a series of effectively sequential operations, and then terminates 
('endp'). When it comes into existence, it takes possession of a block of 
memory keyed at a workspace address, W, and takes control of an instruction 
sequence starting at an instruction address, I. When it terminates, its block 
of memory returns to the possession of a parent, and its instruction sequence 
is abandoned.

By "effectively sequential" is meant that each instruction except the last has 
a unique successor, and each except the first has a unique predecessor. 
However, large gaps of time may exist between the execution of an instruction 
and its successor (blocking or swapping out). This means multitasking can be 
done, as managed by queues and stimuli, with the process's instruction sequence 
being a proper subset of the CPU's instruction sequence.

The process envisioned here is a maximal sequential process. I do not 
generalize the term "process" in either direction - to smaller processes (like 
procedure calls or event responses) or to larger processes (like PAR constructs 
of several processes running multitasked and intercommunicating).


QUEUES

A process takes or loses control of the CPU's instruction sequence by being 
scheduled or descheduled. Except for the last time ('endp') it always gets 
(re)scheduled after being descheduled, and therefore information must be stored 
that makes this rescheduling possible. ALMOST ALL SUCH INFORMATION IS STORED IN 
THE PROCESS'S WORKSPACE, which is dedicated to the process throughout the 
process's life. However, in every case the pointer W to the workspace itself 
must be stored in an external queue to make rescheduling possible.

There are three kinds of queues. (In process and timer queue cases, the 
transputer has two of each kind, corresponding to high and low priority 
processes, but I will ignore that here for simplicity.)

(1) Process queue: A front pointer and back pointer are maintained by the 
system, with the front pointer containing W for the first process in the queue, 
and the last process in the queue containing the address of the back pointer. 
Processes that swap out, as by timeslicing, go into (the end of) the queue, and 
thus it acts as a round robin. Processes that are descheduled, as for timer 
waits or IO, go to other queue(s) for a while and when ready are restored to 
the end of the process queue.

(2) Channel: An IO channel, which is declared OUTSIDE the scope of the 
communicating processes, contains a queue with at most one process in it. When 
communication is ready in the two-process case, it places the queued (early) 
process into the process queue, while the other communicating process continues 
without descheduling.

In the one-process case (hardware link IO or communication to a peripheral that 
is outside the CPU instruction stream), the IO channel is declared outside the 
scope of all process programming, i.e. it is a fixed system resource. 
"Hardware" (i.e. service code for the communication link in question) places 
the process into the process queue when the communication is done.

(3) Timer: A process waiting on a timeout enters a standard timer queue, from 
which it is removed after the timeout either expires or is nullified (the 
latter case arises in an ALT or select that the timeout loses). If appropriate 
it is then placed in the process queue.


ADDRESSING

Transputers use a "falling stack" so that the most used local variables are at 
small positive offsets from W. The special process-wide communication and 
queueing usages are at small negative offsets.

 W[0]  Used for scratch in ALT and short communications.
 W[-1] Set to next I when process is descheduled.
 W[-2] Set to W of next process in process queue.
 W[-3] Set to address of channel holding this process, or ALT state.
 W[-4] Set to W of next process in timer queue, or ALT time selection state.
 W[-5] Set to timeout.

Not all these usages are necessary to the state machines of the queues; some 
are to aid post-mortem analysis. A compiler decides how many of them are needed 
for a given process, based, for instance, on whether timers or ALTs are used in 
that process. Notice also that singly linked lists are used; doubly linked 
lists, at least in the timer ALT case, could have speed advantages.


CHANNEL COMMUNICATION

This (together with Hardware Select, below) is the only state machine needed if 
timer and ALT (the select capability of occam) are not used, or are replaced 
with a timer-based event loop as (I believe) in Stackless.

A channel is only one word long: C[0]=W where W is the address of the waiting 
process. Otherwise C[0]=NotProcess.p where NotProcess.p is a special value that 
can never be a workspace address.

When the early process reaches the IO instruction, it finds NotProcess.p in the 
channel and replaces it with its own workspace address. Then it deschedules 
itself without placing itself in the process queue.

When the late process reaches the communicating IO instruction, or the external 
communication finishes, it finds the workplace address of the early process in 
the channel. It places it in the process queue and writes NotProcess.p in the 
channel.

More complex things happen in case of an ALT.


TIMER

After checking whether it needs to deschedule (i.e. whether timeout already 
happened), the process reaching a timer wait enters the queue of that timer and 
proceeds along it until it finds the place for its timeout. Then it inserts 
itself into that place in the timer queue and deschedules itself without 
placing itself in the process queue.

Timer "hardware" checks the front of the queue at each tick and reschedules any 
processes whose timeouts have expired.

More complex things happen in the case of an ALT.


HARDWARE SELECT

If "hardware" (external) communication(s) and/or timeout(s) are waiting, and no 
process is in the process queue, "hardware" (i.e. virtual machine code external 
to the instructions of all the processes) should do a blocking select on all 
the pending external communications, including the earliest timeout if any. The 
winner(s) then trigger channel or timer queue action. More than one winner is 
OK. All winners go into the process queue.

If any process is in the process queue, and any other process is waiting on 
hardware communications, then, before branching to the next process in the 
process queue, a non-blocking select (i.e. timeout 0) should be used to add any 
ready processes to the process queue. If any process is waiting on a timeout, 
time should be checked. If these checks are too burdensome, the checks could 
take place once per queue cycle, or after a certain instruction count.

If interrupt code is accessible to the "hardware" or virtual machine, this 
rescheduling code can be placed in the interrupt code for efficiency, 
eliminating these selects.


ALT

To be distinguished from the hardware select used to reschedule unconditional 
channel IO or timers, the ALT construct places first-past-the-post 
communication or timeout selection into the process code itself. The transputer 
offers channel ALT, timer ALT, and skip (immediately valid) ALT branches. They 
can be prioritized explicitly, which is a good idea, because fairness can be 
enforced in the process code by rotating the priorities to avoid starvation. 
Unlike C select, the transputer ALT offers only one winner upon which it 
branches, like an if-then-else or case statement. This could be modified at the 
cost of complexifying the code.

Actions passing through an ALT are called "conditional" because they do not 
take place if they lose. The transputer allows ALT only on an input, because a 
conditional input communicating to a conditional output could lead to a 
standoff.

There are three states within an ALT:

 enabling
 waiting
 ready

These are distinguished by the special values Enabling.p, Waiting.p, and 
Ready.p, which are all distinguished from any channel address and go in W[-3]. 
The sequence is:

 start the ALT ('alt' or 'talt')
 enable each branch ('enbc' or 'enbt' or 'enbs')
 wait if necessary ('altwt' or 'taltwt')
 disable each branch ('disc' or 'dist' or 'diss')
 end the ALT ('altend')

If, when enabling, any branch is found already ready, the state goes from 
Enabling.p to Ready.p and the wait is skipped. The order in which the branches 
are enabled does not matter, but the order in which they are disabled 
determines the priority. In the case where a C select is used in a virtual 
machine, it is probably easiest to treat the 'enbs' (if there) as a zero 
timeout and always do the wait for hardware channels, so that you do not have 
to call a lot of non-blocking selects to see if each 'enbt' or hardware 'enbc' 
branch is ready. In the transputer, all the branches are enabled and disabled 
even if there is an early winner or some are cancelled by booleans; but 
equivalent, more efficient strategies could be used in a virtual machine.

If an output finds a process W in its channel and an ALT underway in that 
inputtimg process, it does not do the communication, but replaces the inputting 
process W with its own process W in the channel, deschedules itself, and forces 
the inputting process's ALT to go into disabling, so that this communication 
has a chance to win immediately. If it does not win, the outputting process has 
now put the channel into a ready-to-communicate state, so the next ALT will 
detect this during enabling.

Timer ALTs are special. When enabling, W[-4] must take the state TimeNotSet.p 
until the first timer guard is enabled and then it gets the timeout in W[-5] 
and W[-4] goes to TimeSet.p. Further timer branches may cause this timeout to 
advance. Either it is already ready or it goes into one timer queue.

If the ALT puts a process on the timer queue, it is pulled from the queue at 
the end of the wait, whether or not the timer wins. No matter what branch wins, 
the end of the wait places the process back on the process queue, and when it 
reaches the front the ALT disabling begins. This is to be distinguished from 
the actual communication, if applicable, which takes place after the 'altend' 
and the branching, and reschedules the other (outputting) process without any 
blocking, as is completely standard behavior since the outputter always counts 
as "early" by this point.


REFERENCE

Inmos Ltd: "Transputer Instruction Set, a Compiler Writer's Guide." 
Prentice-Hall, New York, 1988. See Section 10, "Architectural Notes."

FORGOTTEN THE ALT... AGAIN: "A reactive object has two characteristics: the 
abandonment of all blocking operations and the unification of the concepts 
state and process. The former allows a reactive object to accept input from 
multiple sources without imposing any ordering on the input events." --- ??? 
YOU DON'T EVEN NEED ALT... ANY SELECT LOOP IN C CAN PROVE THAT YOU CAN ALLOW 
BLOCKING AND STILL NOT IMPOSE ANY ORDERING ON THE INPUT EVENTS !!!!

Here's the pseudo-occam that shows it:

PROC handler (CHAN a, b, aout, bout)
  WHILE TRUE
    ALT
      a ? astuff
        SEQ
          do a stuff with shared a-b resources
          aout ! astuff -- send to separate aProcess for private work
      b ? bstuff
        SEQ
          do b stuff with shared a-b resources
          bout ! bstuff -- send to separate bProcess for private work

It assumes independent random input from outside through a and b, plus two 
other processes (aProcess and bProcess, not shown) that do any a and b work 
that does not require shared resources between a and b. Given that the shared 
resource stuff is small, the "reaction time" which is the ALT overhead + a ? 
astuff + do a stuff (in the case of a) is very small, so I think it's fair to 
call that reactive. This code does not need occam, and equivalently can be done 
in C with a select; but the occam is short and obvious (I hope). In case of 
bursts, the ALT can be made fair with a little extra work, not shown here, so 
that queued IO comes in in alternating order. 

REFERENCE:

http://www.informatik.uni-freiburg.de/~wehr/software/haskell/

...
rhaskell

Reactive objects are a convenient abstraction for writing programs which have 
to interact with a concurrent environment. A reactive object has two 
characteristics: the abandonment of all blocking operations and the unification 
of the concepts state and process. The former allows a reactive object to 
accept input from multiple sources without imposing any ordering on the input 
events. The latter prevents race conditions because the state of an object is 
only manipulated from the process belonging to the object.

Only a few programming languages exist that make the concept of reactive 
objects available to programmers. Unfortunately, all those systems seem to be 
either not actively maintained anymore or not really matured yet. In this paper 
we demonstrate how reactive objects can be incorporated into the existing 
general purpose programming language Haskell by making use of common extensions 
to Haskell. Our implementation is heavily inspired by the O'Haskell programming 
language. While O'Haskell is a standalone programming language not compatible 
with Haskell, our implementation comes as a Haskell library that smoothly 
integrates the concept of reactive objects into the world of regular functional 
programming.

Requirements: GHC 6.4
Available via darcs:
     darcs get --set-scripts-executable 
http://www.informatik.uni-freiburg.de/~wehr/darcs/rhaskell/
Send patches using darcs send to [EMAIL PROTECTED]

Paper:
Stefan Heimann and Matthias Neubauer
Reactive Objects for Haskell
Informal proceedings of the Fifth Symposium on Trends in Functional Languages 
(TFP 2004). Munich, Germany. November 2004.
.ps.gz, .pdf, bibtex
Last modified: 2008-03-03T11:40:16+01:00

_______________________________________________
Stackless mailing list
[email protected]
http://www.stackless.com/mailman/listinfo/stackless

Re: [Stackless] Stackless based replacement

Reply via email to