Hi Arnar and all,
Attached is the promised set of notes. The sample set of pseudocode is in
the second file, after list member Donn Cave drew my attention to O'Haskell
and its relative, rhaskell, which also seemed to show an odd ignorance of
ALT/select. But chanschd.txt should completely cover the points for
Stackless, unless I'm missing something.
Larry
On 10/3/08, Larry Dickson <[EMAIL PROTECTED]> wrote:
>
> <SNIP>
> On 10/3/08, Arnar Birgisson <[EMAIL PROTECTED]> wrote:
>>
>> <SNIP>
>
>
> I'd be interested to see an implementation :)
>
I'll put some notes together over the weekend and send them along - I have
to pull it out of non-digital 1988 references.
Larry
cheers,
> Arnar
>
Channel based scheduling - the transputer model
Larry Dickson
[EMAIL PROTECTED]
4-5 October 2008
SUMMARY
The purpose of this note is to show how both hardware and software IO can drive
scheduling of processes (by whatever name - "tasklets" is the term used in
Stackless Python) that are run in a single memory and sequential CPU or
analogue. No multitasking OS is needed (or desired) because the model handles
all scheduling. Asynchronous stimulation by hardware IO readiness is OK.
The architecture described is that of the transputer, which is slim and can
easily be simulated (see the Transterpreter project). Transputer instructions,
whose names are usually self-explanatory, will occasionally be referred to in
single quotes, e.g. 'startp'. The Reference gives details.
This note is derived from the Reference, which can still be found.
Unfortunately the Reference does not seem to be available in machine-readable
form.
PROCESSES
A process, in the sense meant in this note, comes into existence ('startp'),
carries out a series of effectively sequential operations, and then terminates
('endp'). When it comes into existence, it takes possession of a block of
memory keyed at a workspace address, W, and takes control of an instruction
sequence starting at an instruction address, I. When it terminates, its block
of memory returns to the possession of a parent, and its instruction sequence
is abandoned.
By "effectively sequential" is meant that each instruction except the last has
a unique successor, and each except the first has a unique predecessor.
However, large gaps of time may exist between the execution of an instruction
and its successor (blocking or swapping out). This means multitasking can be
done, as managed by queues and stimuli, with the process's instruction sequence
being a proper subset of the CPU's instruction sequence.
The process envisioned here is a maximal sequential process. I do not
generalize the term "process" in either direction - to smaller processes (like
procedure calls or event responses) or to larger processes (like PAR constructs
of several processes running multitasked and intercommunicating).
QUEUES
A process takes or loses control of the CPU's instruction sequence by being
scheduled or descheduled. Except for the last time ('endp') it always gets
(re)scheduled after being descheduled, and therefore information must be stored
that makes this rescheduling possible. ALMOST ALL SUCH INFORMATION IS STORED IN
THE PROCESS'S WORKSPACE, which is dedicated to the process throughout the
process's life. However, in every case the pointer W to the workspace itself
must be stored in an external queue to make rescheduling possible.
There are three kinds of queues. (In process and timer queue cases, the
transputer has two of each kind, corresponding to high and low priority
processes, but I will ignore that here for simplicity.)
(1) Process queue: A front pointer and back pointer are maintained by the
system, with the front pointer containing W for the first process in the queue,
and the last process in the queue containing the address of the back pointer.
Processes that swap out, as by timeslicing, go into (the end of) the queue, and
thus it acts as a round robin. Processes that are descheduled, as for timer
waits or IO, go to other queue(s) for a while and when ready are restored to
the end of the process queue.
(2) Channel: An IO channel, which is declared OUTSIDE the scope of the
communicating processes, contains a queue with at most one process in it. When
communication is ready in the two-process case, it places the queued (early)
process into the process queue, while the other communicating process continues
without descheduling.
In the one-process case (hardware link IO or communication to a peripheral that
is outside the CPU instruction stream), the IO channel is declared outside the
scope of all process programming, i.e. it is a fixed system resource.
"Hardware" (i.e. service code for the communication link in question) places
the process into the process queue when the communication is done.
(3) Timer: A process waiting on a timeout enters a standard timer queue, from
which it is removed after the timeout either expires or is nullified (the
latter case arises in an ALT or select that the timeout loses). If appropriate
it is then placed in the process queue.
ADDRESSING
Transputers use a "falling stack" so that the most used local variables are at
small positive offsets from W. The special process-wide communication and
queueing usages are at small negative offsets.
W[0] Used for scratch in ALT and short communications.
W[-1] Set to next I when process is descheduled.
W[-2] Set to W of next process in process queue.
W[-3] Set to address of channel holding this process, or ALT state.
W[-4] Set to W of next process in timer queue, or ALT time selection state.
W[-5] Set to timeout.
Not all these usages are necessary to the state machines of the queues; some
are to aid post-mortem analysis. A compiler decides how many of them are needed
for a given process, based, for instance, on whether timers or ALTs are used in
that process. Notice also that singly linked lists are used; doubly linked
lists, at least in the timer ALT case, could have speed advantages.
CHANNEL COMMUNICATION
This (together with Hardware Select, below) is the only state machine needed if
timer and ALT (the select capability of occam) are not used, or are replaced
with a timer-based event loop as (I believe) in Stackless.
A channel is only one word long: C[0]=W where W is the address of the waiting
process. Otherwise C[0]=NotProcess.p where NotProcess.p is a special value that
can never be a workspace address.
When the early process reaches the IO instruction, it finds NotProcess.p in the
channel and replaces it with its own workspace address. Then it deschedules
itself without placing itself in the process queue.
When the late process reaches the communicating IO instruction, or the external
communication finishes, it finds the workplace address of the early process in
the channel. It places it in the process queue and writes NotProcess.p in the
channel.
More complex things happen in case of an ALT.
TIMER
After checking whether it needs to deschedule (i.e. whether timeout already
happened), the process reaching a timer wait enters the queue of that timer and
proceeds along it until it finds the place for its timeout. Then it inserts
itself into that place in the timer queue and deschedules itself without
placing itself in the process queue.
Timer "hardware" checks the front of the queue at each tick and reschedules any
processes whose timeouts have expired.
More complex things happen in the case of an ALT.
HARDWARE SELECT
If "hardware" (external) communication(s) and/or timeout(s) are waiting, and no
process is in the process queue, "hardware" (i.e. virtual machine code external
to the instructions of all the processes) should do a blocking select on all
the pending external communications, including the earliest timeout if any. The
winner(s) then trigger channel or timer queue action. More than one winner is
OK. All winners go into the process queue.
If any process is in the process queue, and any other process is waiting on
hardware communications, then, before branching to the next process in the
process queue, a non-blocking select (i.e. timeout 0) should be used to add any
ready processes to the process queue. If any process is waiting on a timeout,
time should be checked. If these checks are too burdensome, the checks could
take place once per queue cycle, or after a certain instruction count.
If interrupt code is accessible to the "hardware" or virtual machine, this
rescheduling code can be placed in the interrupt code for efficiency,
eliminating these selects.
ALT
To be distinguished from the hardware select used to reschedule unconditional
channel IO or timers, the ALT construct places first-past-the-post
communication or timeout selection into the process code itself. The transputer
offers channel ALT, timer ALT, and skip (immediately valid) ALT branches. They
can be prioritized explicitly, which is a good idea, because fairness can be
enforced in the process code by rotating the priorities to avoid starvation.
Unlike C select, the transputer ALT offers only one winner upon which it
branches, like an if-then-else or case statement. This could be modified at the
cost of complexifying the code.
Actions passing through an ALT are called "conditional" because they do not
take place if they lose. The transputer allows ALT only on an input, because a
conditional input communicating to a conditional output could lead to a
standoff.
There are three states within an ALT:
enabling
waiting
ready
These are distinguished by the special values Enabling.p, Waiting.p, and
Ready.p, which are all distinguished from any channel address and go in W[-3].
The sequence is:
start the ALT ('alt' or 'talt')
enable each branch ('enbc' or 'enbt' or 'enbs')
wait if necessary ('altwt' or 'taltwt')
disable each branch ('disc' or 'dist' or 'diss')
end the ALT ('altend')
If, when enabling, any branch is found already ready, the state goes from
Enabling.p to Ready.p and the wait is skipped. The order in which the branches
are enabled does not matter, but the order in which they are disabled
determines the priority. In the case where a C select is used in a virtual
machine, it is probably easiest to treat the 'enbs' (if there) as a zero
timeout and always do the wait for hardware channels, so that you do not have
to call a lot of non-blocking selects to see if each 'enbt' or hardware 'enbc'
branch is ready. In the transputer, all the branches are enabled and disabled
even if there is an early winner or some are cancelled by booleans; but
equivalent, more efficient strategies could be used in a virtual machine.
If an output finds a process W in its channel and an ALT underway in that
inputtimg process, it does not do the communication, but replaces the inputting
process W with its own process W in the channel, deschedules itself, and forces
the inputting process's ALT to go into disabling, so that this communication
has a chance to win immediately. If it does not win, the outputting process has
now put the channel into a ready-to-communicate state, so the next ALT will
detect this during enabling.
Timer ALTs are special. When enabling, W[-4] must take the state TimeNotSet.p
until the first timer guard is enabled and then it gets the timeout in W[-5]
and W[-4] goes to TimeSet.p. Further timer branches may cause this timeout to
advance. Either it is already ready or it goes into one timer queue.
If the ALT puts a process on the timer queue, it is pulled from the queue at
the end of the wait, whether or not the timer wins. No matter what branch wins,
the end of the wait places the process back on the process queue, and when it
reaches the front the ALT disabling begins. This is to be distinguished from
the actual communication, if applicable, which takes place after the 'altend'
and the branching, and reschedules the other (outputting) process without any
blocking, as is completely standard behavior since the outputter always counts
as "early" by this point.
REFERENCE
Inmos Ltd: "Transputer Instruction Set, a Compiler Writer's Guide."
Prentice-Hall, New York, 1988. See Section 10, "Architectural Notes."
FORGOTTEN THE ALT... AGAIN: "A reactive object has two characteristics: the
abandonment of all blocking operations and the unification of the concepts
state and process. The former allows a reactive object to accept input from
multiple sources without imposing any ordering on the input events." --- ???
YOU DON'T EVEN NEED ALT... ANY SELECT LOOP IN C CAN PROVE THAT YOU CAN ALLOW
BLOCKING AND STILL NOT IMPOSE ANY ORDERING ON THE INPUT EVENTS !!!!
Here's the pseudo-occam that shows it:
PROC handler (CHAN a, b, aout, bout)
WHILE TRUE
ALT
a ? astuff
SEQ
do a stuff with shared a-b resources
aout ! astuff -- send to separate aProcess for private work
b ? bstuff
SEQ
do b stuff with shared a-b resources
bout ! bstuff -- send to separate bProcess for private work
It assumes independent random input from outside through a and b, plus two
other processes (aProcess and bProcess, not shown) that do any a and b work
that does not require shared resources between a and b. Given that the shared
resource stuff is small, the "reaction time" which is the ALT overhead + a ?
astuff + do a stuff (in the case of a) is very small, so I think it's fair to
call that reactive. This code does not need occam, and equivalently can be done
in C with a select; but the occam is short and obvious (I hope). In case of
bursts, the ALT can be made fair with a little extra work, not shown here, so
that queued IO comes in in alternating order.
REFERENCE:
http://www.informatik.uni-freiburg.de/~wehr/software/haskell/
...
rhaskell
Reactive objects are a convenient abstraction for writing programs which have
to interact with a concurrent environment. A reactive object has two
characteristics: the abandonment of all blocking operations and the unification
of the concepts state and process. The former allows a reactive object to
accept input from multiple sources without imposing any ordering on the input
events. The latter prevents race conditions because the state of an object is
only manipulated from the process belonging to the object.
Only a few programming languages exist that make the concept of reactive
objects available to programmers. Unfortunately, all those systems seem to be
either not actively maintained anymore or not really matured yet. In this paper
we demonstrate how reactive objects can be incorporated into the existing
general purpose programming language Haskell by making use of common extensions
to Haskell. Our implementation is heavily inspired by the O'Haskell programming
language. While O'Haskell is a standalone programming language not compatible
with Haskell, our implementation comes as a Haskell library that smoothly
integrates the concept of reactive objects into the world of regular functional
programming.
Requirements: GHC 6.4
Available via darcs:
darcs get --set-scripts-executable
http://www.informatik.uni-freiburg.de/~wehr/darcs/rhaskell/
Send patches using darcs send to [EMAIL PROTECTED]
Paper:
Stefan Heimann and Matthias Neubauer
Reactive Objects for Haskell
Informal proceedings of the Fifth Symposium on Trends in Functional Languages
(TFP 2004). Munich, Germany. November 2004.
.ps.gz, .pdf, bibtex
Last modified: 2008-03-03T11:40:16+01:00_______________________________________________
Stackless mailing list
[email protected]
http://www.stackless.com/mailman/listinfo/stackless