Re: Microprocessor Optimization Primer

Anne & Lynn Wheeler Mon, 04 Apr 2016 03:48:07 -0700

note that test&set was on both 360/67 and 360/65 machines and was
atomic.


I've commented before about charlie invented compare&swap (chosen
because CAS are his initials) while doing fine-grain multiprocessor
locking working on CP67 (360/67 precursor to vm370) at the science
center.
http://manana.garlic.com/~lynn/subtopic.html#545tech
and
http://manana.garlic.com/~lynn/subtopic.html#smp

then we attempted to get it added to 370 architecture. initially was
rebuffed because the POK favorite son operating system people said that
test&set was more than adequate for multiprocessor support (serializing
critical code sections). The 370 architecture owners said that to get it
justified would require additional uses, not just multiprocessor
serialization. Thus was invented the multiprogramming/multithreading
examples (used whether or not running on multiprocessor machine) that
still are shown in the principles of operation.

The problem in a multithreaded application is it is enabled for
interrupts and can loose control in a locked/critical section.
Compare&Swap is used for doing an atomic operation directly not needing
to lock a critical section.

This was especially leveraged by large multiprogramming/multithreading
DBMS avoiding needing to make kernel calls for lots of serialization
... and by the 80s lots of other platforms (especially those supporting
high-throughput DBMS) were including compare&swap (or instructions with
similar semantics).

I first saw transactional memory on 801/risc in the late 70s.  They
demonstrated that they could do transactional type operations on
applications that weren't originally coded for transactions.

801/risc ROMP (research/office products) that started out going to be a
displaywriter followon.  When the displaywriter followon was canceled,
they looked around and decided to retarget it to the workstation
market. They hired the company that had done the UNIX port to IBM/PC for
PC/IX to do one for romp. This was eventually released as PC/RT and AIX.

The followon to ROMP was RIOS (rs/6000) and they used the transactional
memory to implement JFS ... journalling the UNIX filesystem metadata
changes ... with a claim that it was more efficient that directly
implementing journalling calls in the filesystem.

However, Palo Alto then did a portable JFS that used explicit
journaling calls ... and demonstrated on RS/6000 that it was
much faster than the transaction memory implemention.
http://manana.garlic.com/~lynn/subtopic.html#801

Note that s/370 had very strong (multiprocessor) memory consistency and
cost huge amount in performance. Two processor multiprocessor machines
slowed each processor clock cycle by 10% to accommodate cross-cache
protocol chatter ... and this overhead went up non-linearly. Later IBM
mainframe was running cache machine cycle at much higher rate than the
processor machine cycle.

In the late 80s, I was asked to participate in the standardization
(started by LLNL) of what quickly became fibre-channel standard (on
which they eventually built the heavy-weight FICON protocol that
drastically reduces the native throughput)
http://www.garlic.com/~lynn/submisc.html#ficon

I was also asked to participate in the standardization of scalable
coherant interface (started by people at SLAC ... a large VM370
mainframe installation at the time and host of the monthly IBM BAYBUNCH
user group meetings). SCI was defined for both I/O operations as well as
multiprocessor shared memory operation. The standard SCI memory
concistency defined 64-port memory bus ... that relaxed memory
concistency (compared to IBM mainframe) and allowed for lot larger
mainframe configuration.s Sequent, Data General, Silicon Graphics, and
at least Convex built multiprocessor products.

Sequent & Data General took standard i486 four processor board that
shared cache and built interface to SCI ... being able to get 64
4-processor boards in configurations (256-way processor shared memory
configuration). Convex took standard HP/SNAKE (risc) two processor board
that shared cache and built interface to SCI ... being able to get 64
2-processor boards in configuration. As an aside, much later IBM buys
Sequent and shuts it down.

Note both FCS and SCI started out with fiber that supported concurrent
transfers in both direction.

SCI
https://en.wikipedia.org/wiki/Scalable_Coherent_Interface
is part of what evolves into infiniband
https://en.wikipedia.org/wiki/InfiniBand

other trivia ... in the mid-70s I was involved in project that defined a
16-way shared memory multiprocessor. Lots of people thought it was
really fantastic ... and we got some of the 3033 processor engineers to
work on it in their part time (lot more interesting than mapping 168
logic to 20% faster chips). Then somebody tells the head of POK that it
could be decades before the POK favorite son operating system could
effectively support 16-way (it was 2000 before 16-way shipped) and we
got invited to never visit POK again (and the 3033 processor engineers
were instructed to stop being distracted).

-- 
virtualization experience starting Jan1968, online at home since Mar1970

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: Microprocessor Optimization Primer

Reply via email to