I have not experienced the reset wars talked about in this email and thread.  
We have been testing Linux on a shared SCSI parallel bus for the past 6 months 
and have been satisfied with the stability enough to release our product.  We 
have been careful with what we are supporting though.  We have found the 
aic7xxx driver with the 2944 UW and 2944 UW2 controllers in a small 2 node 
cluster.  We have done some testing with the LSI 896 chip using the sym53cxx 
driver but have not had an opportunity to really run this through it paces yet 
to declare it ready for prime time.

We have run into problems where SCSI Reservations or should I say reservation 
conflicts are treated as error conditions resulting in a Bus Resets.  Negates 
the whole reason to have reservations.  We are driving fixes for this into the 
Linux SCSI subsystem with Red Hat having implemented all of our changes in the 
2.2.16 kernel they distribute and we are working to have these changes added 
to the core linux as well as other distributions.

I have also been able to reliably set up a Fibre Channel cluster of 6 nodes 
using the Qlogic FC 2100 and 2200 adapters using the qla2x00 driver.

So, though I strongly agree that there are plenty of areas where the SCSI 
subsystem could use improvements to increase error recovery and reliability 
what we have now in the 2.2 kernel is reliable enough to use in a small 
cluster.


Eddie Williams
Steeleye Technology
> 
> 
> I'm new to this field, but because I need to face this problem in the real
> world, I want
> to contribute to this discussion my experience and some questions.
> 
> Fact #1 :
> Reset wars do happen. Booting linux system in multi-initiator environment
> often cause infinite
> reset-bus loop, even with only one linux system - others are NT and Sun.
> Sometimes it is ended with crash of all hosts. ( including NT and Sun).
> Almost every time, some period of time after linux boot all hosts are
> loosing network view.
> 
> Fact #2:
> Only 2 linux systems on same FC network, usually do not start infinite bus
> reset.
> 
> Fact #3:
> This behavior isn't seems to be related to low-level driver :
> We have tested 2 QLogic HBA drivers and one Emulex HBA driver.
> 
> Now some questions:
> Which level is responsible to send bus reset ? Is it middle level or low
> level ?
> If it is the middle level, can low level driver filter it out ?
> 
> Which level should deal with hot swapping and adding new devices ?
> Till this discussion I was sure that this is the low-level job.
> 
> If bus reset do happen, it shouldn't affect IO operations from high level
> driver point of view,
> because retries should handle it anyway ?
> 
> 
> Thank you.
> Sergey Vichik.
> StoreAge
> 
> 
> 
> 
> ----- Original Message -----
> From: David Teigland <[EMAIL PROTECTED]>
> To: Mark Veteikis <[EMAIL PROTECTED]>; Kurt Garloff <[EMAIL PROTECTED]>; Chris
> Meadors <[EMAIL PROTECTED]>; Martin Peschke
> <[EMAIL PROTECTED]>
> Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
> Sent: Friday, August 11, 2000 6:16 PM
> Subject: Re: shared SCSI buses
> 
> 
> > On Fri, Aug 11, 2000 at 09:35:19AM -0500, Mark Veteikis wrote:
> > > >
> > > >
> > > > I'm interested in combining two or more active hosts with multiple
> devices
> > > > on a single parallel SCSI bus.  I've successfully done this, but don't
> know
> > > > the extent of problems which could arise when hosts or disks are added
> or
> > > > removed (crashed) on the in-use bus.
> > > >
> > > >  A) How likely is it that the scsi driver(s) will see errors when
> nodes and
> > > >  drives come and go and are there specific cases which are bad?
> > > >
> > > >  B) What are the possibilities of a node surviving if it sees scsi
> errors?
> > > >
> > > >  C) How much work would it take to make all these odd cases reliable?
> > > >
> > > > I'm interested in the status on both 2.2 and 2.4.  Thanks.
> > >
> > > Have you looked at Fibre Channel? Linux has support. Or are your target
> > > devices/HBAs locked into SCSI?
> >
> > Thanks to all for the input.  I should have provided some more background
> > information.  I work on the GFS project and we primarily use Fibre
> Channel.  I
> > know SCA parallel SCSI drives are the way to go, but it still sounds like
> a
> > touchy issue.  I've seen my share of scsi mid-layer errors which lock up
> > the machine, so I wanted to try and get a clearer picture of things.
> >
> > - Hot-swapping SCA disks on the bus should be relatively reliable if it's
> done
> >   with care.  It sounds like if any transfer is happening during a swap
> you're
> >   in serious danger of crashing everthing.  The scsi drivers can be
> prompted to
> >   add or remove devices.  I wonder if multiple hosts put a wrench in
> things
> >   here.
> >
> > - The other important issue is hosts which crash at any time, including
> during
> >   a transfer.  It sounds like the drivers on other machines will currently
> >   start a reset-war, but the drivers could be improved to avoid this and
> >   hopefully keep using the devices as they were.
> >
> > - A similar problem for devices which crash abruptly.
> >
> > - How about adding machines to the bus and then booting them up?
> >
> > By the look of things here, it is not reasonable to use GFS with multiple
> hosts
> > on a shared SCSI bus if you're interested in HA.  If any machine or disk
> > crashes, all your devices are probably in trouble.  Stopping all machines'
> I/O
> > (and maybe unmounting everyone) to add or remove storage would also be
> > prohibitive.
> >
> > Thanks.
> >
> > --
> > Dave Teigland  <[EMAIL PROTECTED]>
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> > the body of a message to [EMAIL PROTECTED]
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [EMAIL PROTECTED]



-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]

Reply via email to