Excuse my ignorance, but what is "FBAF"?

On Tue, Mar 12, 2013 at 5:31 AM, Pavelka, Tomas <[email protected]>wrote:

> We have been trying to format all minidisks from Linux only and this
> turned out to be problematic. I am looking for a solution that would let us
> stay in Linux without having to involve CMS format for every new minidisk.
> Let me first describe the problem:
> When there is a record on dasd that has incorrect cylinder in the count
> area, this leads to "record not found" errors when the dasd is brought
> online. Since the dasd needs to be online before the problem is fixed (by
> formatting) the only way around that I can see is to preformat in CMS.
> If new minidisks are regularly formatted and destroyed, it is possible to
> run into situation where part of the disk has the correct format and part
> has the cylinder number in the count area wrong.
>
> Here is a way to reproduce:
>
>
> 1) Create a minidisk and format it with CDL, e.g.
>
> MDISK FBAF 3390 4819 1000 VMBL2H WR
>
> 2) Delete it and create a minidisk starting at the next cylinder, but half
> the size of the first one:
>
> MDISK FBAF 3390 4820 500 VMBL2H WR
>
> 3) Format it With CDL
>
> 4) Delete the disk and create a new one, spanning the first disk except
> for the first cylinder:
>
> MDISK FBAF 3390 4820 999 VMBL2H WR
>
> This will create a disk that has the first half correct, but the rest of
> the disk has the cylinders off by one in the count area.
>
> 5) Link it from Linux, and put it online
>
>
>
> When the disk is put online, large number of "record not found" errors
> appear in the syslog. On some of our real devices, the errors appear in
> less than a second and the device can be formatted. On other real devices,
> the errors appear in the course of several minutes (highest I have observed
> was about 25 minutes). While the errors appear, the device is not usable
> and cannot be put offline.
>
>
>
> Why I think this is a problem (beyond cluttered syslog):
>
> - The device cannot be put offline until the errors stop appearing.
> Sometimes dasdfmt with --force stops this, but only as long as the device
> is present in /dev which is not always the case.
>
> - While the errors appear, there is contention on the real device where
> the minidisk is located. Any other Linuxes running from the real device
> becomes next to unusable.
>
>
>
> There is a fix in the newer kernels that deals with a similar problem:
>
>
> http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=3bc9fef9cc1e4047c3a3c51d84cc1c5d2ef03cea
>
>
>
> I have tested it and it seems that the initial check is made on the first
> few cylinders only, if the count errors are further towards the end of the
> disk, the problem is still present.
>
> Here is an example of the "record not found" error:
>
> Mar 12 05:52:10 kernelts kernel: dasd-eckd 0.0.fbaf: The specified record
> was not found
> Mar 12 05:52:10 kernelts kernel: dasd(eckd): I/O status report for device
> 0.0.fbaf:
> Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): in req: 000000001ba4dcf0
> CC:00 FC:04 AC:00 SC:17 DS:02 CS:20 fcxs:01 schxs:02 RC:0
> Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): device 0.0.fbaf: Failing
> TCW: 000000001ba4de40
> Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): tsb->length 64
> Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): tsb->flags d1
> Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): tsb->dcw_offset 0
> Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): tsb->count 4096
> Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): residual 4068
> Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): tsb->tsa.iostat.dev_time 81
> Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): tsb->tsa.iostat.def_time 0
> Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): tsb->tsa.iostat.queue_time
> 0
> Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd):
> tsb->tsa.iostat.dev_busy_time 0
> Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd):
> tsb->tsa.iostat.dev_act_time 0
> Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): Sense(hex)  0- 7: 00 08 00
> 00 45 e6 3e 00
> Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): Sense(hex)  8-15: 00 00 00
> 00 00 00 00 04
> Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): Sense(hex) 16-23: e5 11 6a
> 27 85 00 0f 00
> Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): Sense(hex) 24-31: 00 00 40
> e2 00 03 e6 0e
> Mar 12 05:52:10 kernelts kernel: <3>dasd(eckd): 24 Byte: 0 MSG 0, no MSGb
> to SYSOP
> Mar 12 05:52:10 kernelts kernel: Buffer I/O error on device dasdd, logical
> block 179819
>
> Let me know if I need to supply any more information. Also, can anyone
> think of a reason why on some real devices the errors appear in seconds and
> on others it takes such a long time?
>
> Thanks,
> Tomas
>
> Tomas Pavelka
> CA Technologies
> Sr Software Engineer
> Tel:  +420226207796
> [email protected]
>
> <mailto:[email protected]>[cid:[email protected]]<
> http://www.ca.com/>
>
> ----------------------------------------------------------------------
> For LINUX-390 subscribe / signoff / archive access instructions,
> send email to [email protected] with the message: INFO LINUX-390 or
> visit
> http://www.marist.edu/htbin/wlvindex?LINUX-390
> ----------------------------------------------------------------------
> For more information on Linux on System z, visit
> http://wiki.linuxvm.org/
>

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
----------------------------------------------------------------------
For more information on Linux on System z, visit
http://wiki.linuxvm.org/

Reply via email to