I think the newfs/mkfs/"mkfs: close failed on write disk: I/O error " error is more interesting than testing mkfile.
newfs / mkfs -F ufs is using async io. Your iSCSI driver should be receiving new i/o requests, while the current i/o request is still busy. And the script that is constructing the boot_archive is running two such scripts (shell functions) in parallel, one for constructing the 32-bit platform/i86pc/boot_archive file, and another one for constructing the 64-bit platform/i86pc/amd64/boot_archive. So, in addition to using newfs on lofi devices, building two archives in parallel could also submit more that one concurrent i/o request to the iSCSI device. Maybe this is causing data corruption in your iSCSI driver? When you boot from your local IDE disk, mount the iSCSI volume containing the installed Solaris root filesystem to /mnt, and run "/mnt/boot/solaris/bin/create_ramdisk -R /mnt". Does that produce two files /mnt/platform/i86pc/boot_archive and /mnt/platform/i86pc/amd64/boot_archive that can both be uncompressed with gunzip? Or another test: mount your iSCSI volume to /mnt, create two 200mbyte files and compress them in parallel: mount ... /mnt dd if=/dev/urandom bs=1024k count=200 of=/mnt/file1 dd if=/dev/urandom bs=1024k count=200 of=/mnt/file2 gzip -v9 /mnt/file1 & gzip -v9 /mnt/file2 gunzip -t /mnt/file1.gz gunzip -t /mnt/file2.gz Som wrote: > Javen, > So i tried the steps ive written below, the > archive size was 97M ,so i typed > mkfile 97 m file1 ,the command worked just fine, i > captured the network trace throughtout this > operation,all i saw was lots of writes (64K each) ,and > each of them seemed to be completing sucesfully > (atleast from protocol point of view ,both iSCSI and > SCSI reporting good status) > > Any ideas,pls advise? > > Thanks > Som > > > > --- Somnath kotur <[EMAIL PROTECTED]> wrote: > > > Javen, > > So you actually think that the size of the > > archive is definitely incorrect? > > > > Here is what i am proposing to do ,let me know if > > i > > have understood you right .. > > > > - I will boot Solaris off my local IDE disk > > - Mount my iSCSI target disk on a volume say /mnt > > - Use my local disk's boot archive to uncompress > > and > > use as a lofi device? > > - Get the size of the lofi device and change > > directory to /mnt > > - mkfile $size (obtained in above step) <file1> > > > > My DMA logic is based on the one SCSI HBA driver > > that > > came with the opensolaris source ,which is the isp > > driver (esp the multiple cookies /DMA windows > > handling > > part) ... that is my reference ,should be OK ? > > > > BTW, i believe diskiomizer and vdbench are SUN > > internal tools (atleast what i saw on first glance) > > ,is there any way to obtain them on opensolaris ? > > > > Thanks > > Som > > > > > > > > > > --- Javen Wu <[EMAIL PROTECTED]> wrote: > > > > > Hi Som, > > > > > > The experiment I suggested is to prove that the > > > boot-archive has > > > already corrupt before you reboot the machine and > > > problem exists > > > in your write handler. > > > > > > IIRC, boot-archive use mkfile(1M) to create a file > > > and use the file > > > as a lofi device. Since the size of boot-archive > > is > > > not correct, I suspect > > > the command mkfile failed during create archive. > > > > > > You can uncompress a correct archive from CD and > > > check the size of the > > > lofi device before > > > compress. Then I think you can try mkfile(1M) > > > command like "mkfile $size > > > file1" on > > > your iscsi target device with your initiator > > driver > > > so that you can > > > simplify the > > > scenario and debug/trace the problem during write > > > operation. > > > > > > Is it possible you handle (DMA) incorrectly when > > the > > > buffer size is big > > > enough? IE. (multiple cookies or have to bind > > > partially and split window > > > to transfer). > > > > > > Generally, we don't use analyze function of format > > > to verify our driver. > > > We use diskomizer or vdbench which are really good > > > test utilities to test > > > a HBA driver. > > > > > > Cheers > > > Javen > > > > > > > > > Somnath kotur wrote: > > > > > > >Javen/Juergen, > > > > Thank you for the tip,yes i have tried the > > > same > > > >below ... mounted it on /a ,and checked the > > archive > > > by > > > >typing the command below: > > > > > > > ># gunzip < boot_archive > /tmp/bootarchive.img > > > > > > > >resulted in another error saying > > > >'gunzip:stdin: invalid compressed data - format > > > >violated' > > > > > > > >So it did not really help ,tho the size of the > > > >bootarchive seemed to reduce from 38M to 20M > > > > > > > >I then used the 'analyze' option in the 'format' > > > >utility of solaris. This in turn has an option > > > called > > > >'verify' that writes the entire disk and verifies > > > the > > > >contents in many passes > > > > > > > >Ran tests for few hrs and they all passed!! > > > > > > > >The only other problem i could think of was that > > > >whenever i attempted to create a filesystem on my > > > LUN > > > >using: > > > > > > > >newfs -f /dev/rdsk/c1t1d0s2 > > > > > > > >i get an error saying: > > > > > > > > 'mkfs: close failed on write disk: I/O error .' > > > > > > > > Although i get this error im always able to > > > >succesfully mount and read /write files from the > > > >volume ,so i decided to ignore this > > > > > > > >However when i did an 'fsck' on the above special > > > file > > > >i did seem to get lot of inode errors ,and fsck > > is > > > >taking time repairing all of them. > > > > > > > >I got a lot of errors even after doing step 1 > > below > > > >,identifying my BOOT volume and then doing fsck > > on > > > it > > > > > > > >Is there any other utility or some option that > > you > > > can > > > >suggest to identify any write errors ? > > > > > > > >Thanks > > > >Som > > > > > > > > > > > > > > > >--- Javen Wu <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > > > > >>FYI > > > >> > > > >> > > > >>>Date: Mon, 03 Mar 2008 02:38:00 +0000 > > > >>> > > > >>> > > > >>From: Javen Wu <[EMAIL PROTECTED]> > > > >>To: Somnath kotur <[EMAIL PROTECTED]> > > > >>CC: [EMAIL PROTECTED] > > > >>Subject: Re: Fwd: iSCSI LUN Boot > > > >> > > > >>Hi Som, > > > >> > > > >>My point of view, the problem could not caused > > by > > > >>Synchronize_cache failure. > > > >>Because synchronize_cache is not a mandatory > > > command > > > >>in SCSI spec, that > > > >>means > > > >>even without the command, the system can work > > > fine. > > > >> > > > >>My guess there is some error on your handler for > > > >>WRITE. In another > > > >>words, I guess > > > >>something wrong during your driver write buffer > > > out. > > > >> > > > >>Could u do a experiment as below: > > > >>1. boot the machine with Solaris CD and your ITU > > > >>disk > > > >>2. using `bootadm update-archive -R $ROOT` to > > > update > > > >>the boot-archive on > > > >>your iSCSI target. > > > >>Here $ROOT should be your mount point of your > > > >>alternative root. > > > >>3. before reboot, could you verify the new > > > >>boot-archive under your > > > >>$ROOT/platform/i86pc/ corrupt or not. > > > >>4. if the image correct, please reboot the > > machine > > > >>from your iSCSI > > > >>target again. > > > >> > > > >>If #3 is failed, that means your write has some > > > >>problem. So please do > > > >>test write and verify with your driver. > > > >> > > > >>Cheers > > > === message truncated === _______________________________________________ driver-discuss mailing list driver-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/driver-discuss