I moded DMKCCW in the VM/SP 1, 2 3 and up days... VM was always great especially when you had source code.... After OCO days, well we wont go there... Scott J Ford
________________________________ From: Anne & Lynn Wheeler <[email protected]> To: [email protected] Sent: Thursday, December 25, 2008 4:16:01 PM Subject: Re: Computer History Museum The following message is a courtesy copy of an article that has been posted to bit.listserv.ibm-main,alt.folklore.computers as well. [email protected] (Rick Fochtman) writes: > I don't remember all the mods we made at NCSS, but one change that made > a BIG difference on the simplex and duplex 360/67's was this: in the CP > kernel, ALL SVC instructions were modified to a BAL to a specific > address in the first 4K of storage, where a "vector table" rerouted the > call to a specific CP "subroutine". All those interrupts and PSW swaps > took FOREVER on the 360/67, whereas a BAL to low storage SEEMED to fly > almost instantaineously. The change also seemed to be beneficial when we > switched to 370/168 platforms as well. The CMS kernel used a HVC (in > actual fact, a DIAGNOSE) to request services from the CP kernel, > including I/O services. We also modified MVT to run in a virtual machine > using DIAGNOSE, rather than SIO/TIO/HIO, for I/O services. Made MVT run > MUCH FASTER in the virtual machine and freed us from all the related > emulation of these I/O instructions. One thing I miss: Grant wrote a > program, called IMAGE, that created a complete image of the CP kernel, > which would load in record time when bringing up the system. I wish I > had a copy of that program now, because of its rather unique processing > of the RLD data from the object code. I've never quite understood how > RLD data is processed by either the linkage editor or the loader. :-( re: http://www.garlic.com/~lynn/2008s.html#51 Computer History Museum http://www.garlic.com/~lynn/2008s.html#52 Computer History Museum http://www.garlic.com/~lynn/2008s.html#54 Computer History Museum as an undergraduate ... before joining the science center ... I first looked at the standard SVC linkage routine (for all kernel calls) and cut the pathlength by about 75%. I then looked at the most frequently called subroutines ... and changed them to BALRs ... leaving the remaining as SVC ... since it no longer represented a significant portion of CP overhead .... i.e. while SVC/LPSW was expensive with regard to BALR ... the actual time spent in the original SVC linkage&return was much, much larger than the SVC/LPSW instruction ... most of the benefit came from reducing the logic. The next was the BALR ... not only replaced the SVC/LPSW instructions but were also "eliminated" the rest of the logic for the linkage/return for high-use routines. When that was done, the remaining SVC/LPSW (and associated linkage/return overhead) was a trivial percentage of overall time spent in the kernel. Remaining big overhead wasn't so much the SIO instruction ... but the channel program simulation overhead done in "CCWTRANS". CMS turned out to do very stylized disk channel programs. I created a fastpath channel program emulation operation for CMS disk I/O (that was also syncronous ... avoiding all the virtual machine gorp for entering wait state, asyncronous interrupts, etc). This got severely criticized by the people at the science center (mostly bob adair) because it violated the 360 principles of operation. However, it did significantly reduce cp67 kernel overhead for operating CMS virtual machines. This was then redone using "DIAGNOSE" instruction ... since the 360 principles of operation defines the "DIAGNOSE" instruction operation as model-dependent. The facade was that there was a 360 "virtual machine" machine model which had its own definition for DIAGNOSE instruction operation. Standard CP67 saved core image of the loaded kernel to disk (routine SAVECP) and a very fast loader sequence that brought back that image back into memory on IPL and then transferred to CP67 startup routine CPINIT. One of the people at the science center modified CP67 kernel failure processing to write a image dump to disk area and then simulate reloaded the disk kernel image from scratch ... basically automagically failure/restart ... this is mentioned in one of the referenced stories at MULTICS websites ... one of the people who supported CP67 system at MIT (and later worked on MULTICS) had modified TTY/ASCII terminal line processing that would cause the system to crash ... and one day CP67 crashed and automagically (fast) restarted 27 times in a single day (which help instigate some MULTICS rewrite because it was taking an hour elapsed time to restart). The cp67 kernel was undergoing was amount of evolution with new functions being added. On 768k real storage machine ... every little bit hurt. So I did a little slight of hand and created a virtual address space that mapped the cp67 kernel image ... and then flagged the standard portion as fixed ... but created an infrastructure that allowed other portions to be paged in & out. This required enhancing the SVC linkage infrastructure to recognize portions of the kernel that could be pageable (and do page fetch operation before doing the linkage). The standard CP67 kernel was built up of "card decks" which had the BPS loader slapped on the front and "IPL'ed" (either on the real machine or in a virtual machine). Once the BPS loader had all the routines resolved in real storage ... it would transfer to SAVECP ... which wrote the core image to disk (for later IPL). It turns out that the BPS loader also passed (in registers) the pointer to the resolved (RLD) symbol table. I then changed SAVECP to move the BPS (RLD) symbol table to the end of the (pageable) kernel image ... so that it was also saved to disk (as part of the pageable kernel area). I ran into a major problem ... the BPS loader only supported up to 256 external symbols. As part of reorg'ing parts of the kernel to make it pageable ... i split modules into 4k-byte "chunks" ... creating a lot of new external symbols. This initially overflowed the BPS loader 256 external symbol limit ... and so I had to resort to all sorts of hacks to keep the number of external symbols within the 256 limit. Much later at the science center ... I found a source copy of the BPS loader in a old card cabinet that was in storage ... I could then modify the BPS loader to extend the external symbol table maximum. for additional drift ... in the initial work to convert MVT into VS2 ... some virtual address tables and page fault processing was hacked into the side of MVT ... and a copy of CCWTRANS was borrowed from CP67 (i.e. VS2 has the same issue with translating application channel programs passed by EXCP ... as CP67/VM370 has with translating virtual machine channel programs). Past posts with references to CCWTRANS: http://www.garlic.com/~lynn/2008g.html#45 authoritative IEFBR14 reference http://www.garlic.com/~lynn/2008i.html#68 EXCP access methos http://www.garlic.com/~lynn/2008i.html#69 EXCP access methos http://www.garlic.com/~lynn/2008m.html#7 Future architectures http://www.garlic.com/~lynn/2008o.html#50 Old XDS Sigma stuff http://www.garlic.com/~lynn/2008q.html#31 TOPS-10 The thing missing from the automagic fast restart ... was the growing number of "service virtual machines" that had to be brought up manually ... i.e. performance monitor DUSETIMR machine, the VNET, networking machine, and growing number of others. These "service virtual machines" are analogous to the current genre of "virtual appliances" found in the latest incarnation of virtual machine technology. As part of the performance work on cp67 and then moving to vm370 ... I also did a lot of benchmarking work. One of the things that I wanted to do was automate the benchmarking process ... lots of past posts with references http://www.garlic.com/~lynn/submain.html#benchmark For this, I created the "AUTOLOG" command ... where a virtual machine could automagically logon other virtual machines ... including passing an initial startup command to that virtual machine. Then DMKCPI (the rename CPINIT for vm370) was modified to do a special case execution of the AUTOLOG command for a specific virtual machine (which would then handle all the other AUTOLOGS). As mentioned other places, part of the final sequence for the release of my (vm370) resource manager ... i ran a series of 2000 (automated) benchmarks that took 3months elapsed time (as part of final calibration and verification). However, the AUTOLOG command was also got a lot of use as part of automating the other parts of automatic bringup (in addition to just getting the bare bones kernel operational). A few past posts mentioning AUTOLOG command: http://www.garlic.com/~lynn/2002q.html#28 Origin of XAUTOLOG (x-post) http://www.garlic.com/~lynn/2005.html#59 8086 memory space http://www.garlic.com/~lynn/2006g.html#34 The Pankian Metaphor http://www.garlic.com/~lynn/2007d.html#23 How many 36-bit Unix ports in the old days? http://www.garlic.com/~lynn/2007n.html#10 The top 10 dead (or dying) computer skills http://www.garlic.com/~lynn/2007r.html#68 High order bit in 31/24 bit address http://www.garlic.com/~lynn/2007s.html#41 Age of IBM VM http://www.garlic.com/~lynn/2008m.html#42 APL As mentioning in previous references ... one of the things I did after joining the science center ... was also doing a pagemapped filesystem for CMS. The diagnose I/O API was specific oriented towards drastically reducing the pathlength overhead associated with CMS I/O. However, there are still some large number of performance issues related to simulating a "real address I/O" paradigm in a virtual address environment. The page map changes retained the high level CMS filesystem paradigm while remapping the underlying implementation to page mapped infrastructure. Misc. past post mentioning doing page mapped infrastructure http://www.garlic.com/~lynn/submain.html#mmap There were some benchmark comparisons with the same cms and mix-mode, moderately filesystem intensive operation ... one using underlying traditional CMS filesystem .... and same CMS, workload, and CMS filesystem ... but underlying paged mapped ... where the paged mapped flavor had three times the throughput of the traditional non-paged mapped flavor. -- 40+yrs virtualization experience (since Jan68), online at home since Mar70 ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html

