I'm new to this field, but because I need to face this problem in the real
world, I want
to contribute to this discussion my experience and some questions.
Fact #1 :
Reset wars do happen. Booting linux system in multi-initiator environment
often cause infinite
reset-bus loop, even with only one linux system - others are NT and Sun.
Sometimes it is ended with crash of all hosts. ( including NT and Sun).
Almost every time, some period of time after linux boot all hosts are
loosing network view.
Fact #2:
Only 2 linux systems on same FC network, usually do not start infinite bus
reset.
Fact #3:
This behavior isn't seems to be related to low-level driver :
We have tested 2 QLogic HBA drivers and one Emulex HBA driver.
Now some questions:
Which level is responsible to send bus reset ? Is it middle level or low
level ?
If it is the middle level, can low level driver filter it out ?
Which level should deal with hot swapping and adding new devices ?
Till this discussion I was sure that this is the low-level job.
If bus reset do happen, it shouldn't affect IO operations from high level
driver point of view,
because retries should handle it anyway ?
Thank you.
Sergey Vichik.
StoreAge
----- Original Message -----
From: David Teigland <[EMAIL PROTECTED]>
To: Mark Veteikis <[EMAIL PROTECTED]>; Kurt Garloff <[EMAIL PROTECTED]>; Chris
Meadors <[EMAIL PROTECTED]>; Martin Peschke
<[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Friday, August 11, 2000 6:16 PM
Subject: Re: shared SCSI buses
> On Fri, Aug 11, 2000 at 09:35:19AM -0500, Mark Veteikis wrote:
> > >
> > >
> > > I'm interested in combining two or more active hosts with multiple
devices
> > > on a single parallel SCSI bus. I've successfully done this, but don't
know
> > > the extent of problems which could arise when hosts or disks are added
or
> > > removed (crashed) on the in-use bus.
> > >
> > > A) How likely is it that the scsi driver(s) will see errors when
nodes and
> > > drives come and go and are there specific cases which are bad?
> > >
> > > B) What are the possibilities of a node surviving if it sees scsi
errors?
> > >
> > > C) How much work would it take to make all these odd cases reliable?
> > >
> > > I'm interested in the status on both 2.2 and 2.4. Thanks.
> >
> > Have you looked at Fibre Channel? Linux has support. Or are your target
> > devices/HBAs locked into SCSI?
>
> Thanks to all for the input. I should have provided some more background
> information. I work on the GFS project and we primarily use Fibre
Channel. I
> know SCA parallel SCSI drives are the way to go, but it still sounds like
a
> touchy issue. I've seen my share of scsi mid-layer errors which lock up
> the machine, so I wanted to try and get a clearer picture of things.
>
> - Hot-swapping SCA disks on the bus should be relatively reliable if it's
done
> with care. It sounds like if any transfer is happening during a swap
you're
> in serious danger of crashing everthing. The scsi drivers can be
prompted to
> add or remove devices. I wonder if multiple hosts put a wrench in
things
> here.
>
> - The other important issue is hosts which crash at any time, including
during
> a transfer. It sounds like the drivers on other machines will currently
> start a reset-war, but the drivers could be improved to avoid this and
> hopefully keep using the devices as they were.
>
> - A similar problem for devices which crash abruptly.
>
> - How about adding machines to the bus and then booting them up?
>
> By the look of things here, it is not reasonable to use GFS with multiple
hosts
> on a shared SCSI bus if you're interested in HA. If any machine or disk
> crashes, all your devices are probably in trouble. Stopping all machines'
I/O
> (and maybe unmounting everyone) to add or remove storage would also be
> prohibitive.
>
> Thanks.
>
> --
> Dave Teigland <[EMAIL PROTECTED]>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]