Excuse my ignorance, but what is "FBAF"? On Tue, Mar 12, 2013 at 5:31 AM, Pavelka, Tomas <[email protected]>wrote:
> We have been trying to format all minidisks from Linux only and this > turned out to be problematic. I am looking for a solution that would let us > stay in Linux without having to involve CMS format for every new minidisk. > Let me first describe the problem: > When there is a record on dasd that has incorrect cylinder in the count > area, this leads to "record not found" errors when the dasd is brought > online. Since the dasd needs to be online before the problem is fixed (by > formatting) the only way around that I can see is to preformat in CMS. > If new minidisks are regularly formatted and destroyed, it is possible to > run into situation where part of the disk has the correct format and part > has the cylinder number in the count area wrong. > > Here is a way to reproduce: > > > 1) Create a minidisk and format it with CDL, e.g. > > MDISK FBAF 3390 4819 1000 VMBL2H WR > > 2) Delete it and create a minidisk starting at the next cylinder, but half > the size of the first one: > > MDISK FBAF 3390 4820 500 VMBL2H WR > > 3) Format it With CDL > > 4) Delete the disk and create a new one, spanning the first disk except > for the first cylinder: > > MDISK FBAF 3390 4820 999 VMBL2H WR > > This will create a disk that has the first half correct, but the rest of > the disk has the cylinders off by one in the count area. > > 5) Link it from Linux, and put it online > > > > When the disk is put online, large number of "record not found" errors > appear in the syslog. On some of our real devices, the errors appear in > less than a second and the device can be formatted. On other real devices, > the errors appear in the course of several minutes (highest I have observed > was about 25 minutes). While the errors appear, the device is not usable > and cannot be put offline. > > > > Why I think this is a problem (beyond cluttered syslog): > > - The device cannot be put offline until the errors stop appearing. > Sometimes dasdfmt with --force stops this, but only as long as the device > is present in /dev which is not always the case. > > - While the errors appear, there is contention on the real device where > the minidisk is located. Any other Linuxes running from the real device > becomes next to unusable. > > > > There is a fix in the newer kernels that deals with a similar problem: > > > http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=3bc9fef9cc1e4047c3a3c51d84cc1c5d2ef03cea > > > > I have tested it and it seems that the initial check is made on the first > few cylinders only, if the count errors are further towards the end of the > disk, the problem is still present. > > Here is an example of the "record not found" error: > > Mar 12 05:52:10 kernelts kernel: dasd-eckd 0.0.fbaf: The specified record > was not found > Mar 12 05:52:10 kernelts kernel: dasd(eckd): I/O status report for device > 0.0.fbaf: > Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): in req: 000000001ba4dcf0 > CC:00 FC:04 AC:00 SC:17 DS:02 CS:20 fcxs:01 schxs:02 RC:0 > Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): device 0.0.fbaf: Failing > TCW: 000000001ba4de40 > Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): tsb->length 64 > Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): tsb->flags d1 > Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): tsb->dcw_offset 0 > Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): tsb->count 4096 > Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): residual 4068 > Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): tsb->tsa.iostat.dev_time 81 > Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): tsb->tsa.iostat.def_time 0 > Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): tsb->tsa.iostat.queue_time > 0 > Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): > tsb->tsa.iostat.dev_busy_time 0 > Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): > tsb->tsa.iostat.dev_act_time 0 > Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): Sense(hex) 0- 7: 00 08 00 > 00 45 e6 3e 00 > Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): Sense(hex) 8-15: 00 00 00 > 00 00 00 00 04 > Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): Sense(hex) 16-23: e5 11 6a > 27 85 00 0f 00 > Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): Sense(hex) 24-31: 00 00 40 > e2 00 03 e6 0e > Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): 24 Byte: 0 MSG 0, no MSGb > to SYSOP > Mar 12 05:52:10 kernelts kernel: Buffer I/O error on device dasdd, logical > block 179819 > > Let me know if I need to supply any more information. Also, can anyone > think of a reason why on some real devices the errors appear in seconds and > on others it takes such a long time? > > Thanks, > Tomas > > Tomas Pavelka > CA Technologies > Sr Software Engineer > Tel: +420226207796 > [email protected] > > <mailto:[email protected]>[cid:[email protected]]< > http://www.ca.com/> > > ---------------------------------------------------------------------- > For LINUX-390 subscribe / signoff / archive access instructions, > send email to [email protected] with the message: INFO LINUX-390 or > visit > http://www.marist.edu/htbin/wlvindex?LINUX-390 > ---------------------------------------------------------------------- > For more information on Linux on System z, visit > http://wiki.linuxvm.org/ > ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 ---------------------------------------------------------------------- For more information on Linux on System z, visit http://wiki.linuxvm.org/
