On 2009-12-15, Frans Pop wrote: > > OK. My s390 knowledge is very limited. My understanding was that minidisks > were not supported at all (as there's a longstanding BR open to add > support for them in the installer). >
OK, now I think I understand where the confusion lies. I'd start a new thread here; but since the subject line already says "minidisk support", and since that's exactly what we're discussing now, I'll just continue with the current thread. If you want to split this off into another thread, be my guest. I assume that you are referring to bug report number 447755, which I opened. (That reminds me, I opened it under my old e-mail address. I've got to get the e-mail address updated.) Please forgive me if I insult your intelligence or give too much information. That is not my intent. I have a tendency to do that at times, but I don't do it on purpose. I do respect you. I just don't know what you do know and what you don't know. So I'll just explain the whole thing and please politely ignore what you already know. We'll start with the definition of DASD. DASD is an acronym which stands for Direct Access Storage Device. It's a general term for any storage device in which the records can be easily accessed in random order, such as a disk. This is in contrast with a sequential access storage device, in which the records must be accessed in sequential order, such as a magnetic tape. Historically, DASD was not necessarily a disk device. In the early days of mainframes, there were magnetic drum devices as well, and they were also classified as DASD. But those devices fell by the wayside long ago. In today's environment, DASD and disk are practically, though not technically, synonymous. Mainframe DASD comes in two basic types: FBA (Fixed Block Architecture) and CKD (Count Key Data). FBA DASD is similar to the type of disk devices used in the world of PCs. The physical blocks on disk are all of a fixed size: 512 bytes. Sometimes you will see FBA DASD described as an FB-512 device. In theory, an FBA device can use other blocksizes; but to the best of my knowledge every FBA device ever made for mainframe use has a physical blocksize of 512. CKD DASD is different. With CKD DASD, the physical blocks on disk can be of all different sizes, from a theoretical minimum of 1 to a theoretical maximum of 65535 (hex ffff). In order to keep track of things, there is a special little block in front of every main block called a count block which contains the size (length) of the following main block, or data block. In addition, some types of blocks, such as directory blocks for partitioned data sets, also contain keys. The key is typically a sort key that is significant to the type of data being stored. In the case of a partitioned data set directory block, it is the key (member name) of the highest-sorting member (in the EBCDIC collating sequence) of all the members described in that directory block. Thus, for a keyed block, there are actually three blocks on the disk: a count block, a key block, and a data block. Most blocks do not have keys, they have only a count block and a data block. But in the general case, a block on this type of DASD device may have keys. Hence the name count-key-data or CKD DASD. In Linux, the FBA driver (the dasd_fba_mod kernel module) supports FBA devices and the ECKD driver (the dasd_eckd_mod kernel module) supports CKD devices. The E in ECKD stands for Extended. This refers to some extra channel commands supported by the control unit which allow some high-performance options, such as reading a whole track at a time, etc. But the underlying data format is still CKD. When people speak of ECKD DASD, what they mean is CKD-format DASD which has a control unit which supports the extra ECKD channel commands. Different IBM DASD devices are identified by a four-digit device-type number. For example, 3370, 9336, 9332, and 9335 are four different device types of FBA DASD. 9345, 3390, 3380, and 3350 are four different device types of CKD DASD. These different device types differ from each other in things like track capacity, number of tracks per cylinder, average seek time, average rotational delay, channel speed, etc. In addition to the main four-digit number, there is often a suffix to distinguish different models. These different models most often differ from each other only in the number of cylinders they possess. For example, a 3390-3, the most popular model of 3390, has 3339 cylinders, numbered from 0-3338. The 3390-9 has 10017 cylinders, numbered 0-10016. IBM has two main historical operating systems: VSE and MVS. VSE added support for FBA DASD, but MVS never did. MVS can only use CKD DASD. And since MVS is IBM's most popular (and most lucrative) operating system, CKD DASD is far more popular with mainframe customers than FBA DASD is. Of course, these days, hardly anybody uses "real" mainframe DASD anymore, be it FBA or CKD. Most mainframe customers use some kind of RAID box which is emulating traditional mainframe DASD under the covers. To the mainframe, it looks like traditional mainframe DASD. But the physical implementation on the back end is some type of RAID implementation which uses PC hard disks. So even though the actual physical disks on which the data is stored may use fixed-length 512-byte blocks, the software within the RAID box has to make it look like CKD DASD to the mainframe. Otherwise, you can't run MVS. In many RAID implementations, a physical hard disk will contain only whole emulated DASD volumes. It may be mirrored somewhere else, but the point is you won't see part of a logical volume on one physical disk and the rest of that logical volume on another physical disk. Since the size of PC hard disks is rarely a multiple of the size of mainframe logical volumes, this leaves some unused space left over. In order to get the maximum space utilization, some vendors offer non-standard sized disks to the customer when they configure the RAID box. They might have, lets say, ten 3390-3 standard- sized volumes of 3339 cylinders each and one partial volume of whatever is left. Maybe it's only, say, 1597 cylinders long. Different vendors have different names for these things, but one vendor calls it a "hyper-volume". It has all the characteristics of a 3390-3, but it's only, in this example, 1597 cylinders long instead of the standard 3339 cylinders long. Some customers don't want these odd-ball sized volumes; so the vendor doesn't configure them and the leftover space is wasted. I believe that the Linux FBA driver and the Linux ECKD driver will support these short volumes, or hyper-volumes. Somehow they can obtain the number of cylinders, perhaps through the RDC (Read Device Character- istics) or RCD (Read Configuration Data) channel commands. I don't know how they do it, but somehow they can find out the number of cylinders. And when Linux is running in an LPAR, or in basic mode on older S390 models, they can use either a standard-sized volume or a hyper-volume. OK, so far so good. All of this is background. So what's a minidisk? A minidisk is a construct of the z/VM operating system. z/VM is another IBM operating system, like VSE and MVS. The "z" comes from z-series or system-z, the hardware it runs on (64-bit mainframes). The VM stands for Virtual Machines. You can think of it as "VMWARE for the mainframe". z/VM creates virtual images of a mainframe, called virtual machines. Each virtual machine has a userid associated with it, similar to a username under Linux. (Unlike Linux, however, z/VM does not allow multiple simultaneous logins under the same name.) The component of z/VM which creates and manages these virtual machines is called CP, which stands for the Control Program. So how does an operating system such as Linux, which is running in a virtual machine under z/VM, access its DASD? Well, there are two basic ways. One way is by using DEDICATED DASD. A whole DASD volume (a regular volume or a hyper-volume) can be dedicated to a particular virtual machine, either through the DEDICATE statement in the CP directory entry for the virtual machine or dynamically via the CP ATTACH command. In this mode, as the name implies, only that virtual machine has any access to the DASD volume. Others virtual machines cannot touch it. The other way is by the use of minidisks. A minidisk is a contiguous range of cylinders on a DASD volume that CP knows by name. It's name is the owning virtual machine combined with the virtual address. For example, if there is a virtual machine defined to CP called DEBIAN1, and there is an MDISK statement in the CP directory entry of DEBIAN1 which defines virtual device number 500, then the name of the minidisk is DEBIAN1 500. Minidisks can (potentially) be shared by multiple virtual machines. Here is an example MDISK statement: MDISK 0201 3390 1751 0075 VMSY05 MR HARRY LARRY MARY This minidisk definition, present in the directory entry for virtual machine DEBIAN1, defines virtual device number 0201 (or simply 201, the leading zero is not significant) as a minidisk. (Device numbers, whether they are real or virtual, are implicitly hexadecimal numbers.) The definition states that the underlying device type is a 3390, which is a CKD device. It states that the starting cylinder number of the minidisk is 1751 (a decimal number). It states that the size of the minidisk is 75 cylinders (a decimal number). And it states that the volume serial number of the real DASD volume on which this minidisk resides is VMSY05. The MR is the access mode used by DEBIAN1 to link to this minidisk at logon time. HARRY, LARRY, and MARY are minidisk link passwords (as opposed to virtual machine logon passwords). They allow READ, WRITE, and MULT access, respectively. If the device type of the real DASD volume does not match the device type specified in the MDISK statement, then the MDISK statement is in error. A special case of a minidisk is when the minidisk overlaps the entire real DASD volume. That is, the starting cylinder is zero and the number of cylinders is equal to the number of cylinders of the real DASD volume (or is equal to the special keyword END). This is called a full-pack minidisk. By creating a full-pack minidisk you can share real DASD volumes between virtual machines. Minidisks, including full-pack minidisks, can be created on regular or hyper- volumes. Do the Linux DASD drivers support minidisks in a virtual machine under z/VM? Yes, they do. A minidisk is similar to a hyper-volume in that it has the characteristics of the underlying device, but may (and usually does) have a non-standard number of cylinders. But instead of the shortened disk being emulated in hardware by a RAID box, it is emulated in software by CP. CP steps in and alters the channel program to change the cylinder numbers in channel commands so that when the "real" device sees the channel program, it goes after the right data. It's all done by smoke and mirrors inside CP. The operating system in the virtual machine thinks its going after cylinder 0 but its really going after cylinder 1751 (in this example). Now that we're talking about virtual machines under z/VM, the DIAG driver (kernel module dasd_diag_mod) comes into play. The DIAG driver works *only* in a virtual machine under z/VM. It uses a special instruction (DIAGNOSE code X'250') which is meaningful *only* in a virtual machine. But the DIAG driver will work with either a minidisk or a dedicated DASD device. OK, so if all three drivers support minidisks, then what is Debian bug report 447755 all about? The issue here is the *format* of the minidisk. A DASD device, be it a dedicated device or a minidisk, can have one of four formats under Linux for s390: cdl, ldl, CMS non-reserved, and CMS reserved. The FBA driver supports two of the four formats: CMS non-reserved and CMS reserved. The DIAG driver supports three of the four formats: ldl, CMS non-reserved, and CMS reserved. The ECKD driver supports all four formats. Preparing a disk for use under Linux for s/390, like other operating systems, and other platforms, involves three basic steps: low-level formatting, partitioning, and high-level formatting. How you do that depends on which disk format you want to end up with. ---------- cdl format: Low-level formatting: dasdfmt -d cdl (this is the default format for dasdfmt) Partitioning: fdasd (up to three partitions can be created) High-level formatting: mke2fs or mkswap ldl format: Low-level formatting: dasdfmt -d ldl Partitioning: none (a single partition is implied) High-level formatting: mke2fs or mkswap CMS non-reserved: Low-level formatting: CMS FORMAT command Partitioning: none (a single partition is implied) High-level formatting: mke2fs or mkswap CMS reserved: Low-level formatting: CMS FORMAT command Partitioning: CMS RESERVE command (a single partition is created) High-level formatting: mke2fs or mkswap ---------- The issue in 447755 is that the Debian installer only supports cdl format. And since this is the only format that the DIAG driver *doesn't* support, the DIAG driver cannot be used, even after installation, without migrating the data to other minidisks after the installation. (It also means that you can't install to FBA DASD, since cdl format is not valid for FBA DASD. Since I don't have any FBA DASD, that does not affect me. But it might affect others.) My preferred format, for reasons explained below, is the CMS reserved format. I have published migration instructions for how to migrate the data to CMS reserved minidisks after installation here: http://www.wowway.com/~zlinuxman/diag250.htm This document also explains why the CMS reserved minidisk is my preferred format for Linux virtual machines under z/VM. There is no problem with Debian *running* on CMS reserved minidisks. It works great. The problem is that I can't *install* to CMS reserved minidisks. I can with Suse, though. ;-) And once I have the data on CMS reserved minidisks, I can use the DIAG driver. It works great. Except for the bug that we are discussing that prevents it from working with minidisks that are linked read-only. To get that to work I need to apply a patch to the kernel source code and compile a custom kernel. And that bug affects all distributions, not just Debian. It's not a Debian-installer-specific problem. There is also another issue mentioned in 447755, and that is a problem with mounting devices in the wrong order. In hindsight, I should have created two separate bug reports: one for the lack of support for CMS minidisks and one for the mount order issue. I apologize for the long reply, but again; I don't know what you know and what you don't know. I hope I haven't put you to sleep. And I hope that I have resolved the confusion. On 2009-12-15, Frans Pop wrote: > Eh, what? Why would I have to do that? I have no special status > here. As an official Debian developer, you carry more weight than I do. I can appeal to the kernel team, of course; but if you say I'm full of excrement, they'll believe you more than they will me. -- To UNSUBSCRIBE, email to debian-s390-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org