Hi Mike (and all),

Well, much has happened in the Linux world the past 6 months!

As such, with more credibility and viability behind it, I have had a
much easier time promoting and securing substantial projects for linux
database server applications.  We have (at the moment) selected a DPT
14-bay RAID5 rack-mount unit and a PM3334UDW controller for one project
in particular.  The EATA_DMA driver seems to do very well, but, recently
we started getting some very pesky problems.

We are actually using 2 DPT controllers in the system (which, BTW, is a
ASUS "Gigabyte" dual P2-400Mhz 100 MHZ motherboard with the Intel 440BX
chipset and 256MB PC-100 RAM).  We have the differential PM3334UDW with
64MB of ECC cache RAM.  The is attached to an external connect cable to
the RAID cabinet.  In the RAID cabinet we have 8 18gb Seagate 7200
ultrawide SCSI drives (provided by DPT) in a single RAID5 configuration
(around 121gb).  The entire RAID5 array is formatted with EXT2 fs also. 
It seems to work reasonably well.  The other controller with a PM2144UW
with 2 4.5gb Seagate drives.  The comprise the Linux OS install.  The
RAID5 array is for Oracle8 for Linux only.  Oh yeah, the linux is RedHat
5.1 with all updates applied (this a 2.0.35 kernel).  And, the PM2144UW
is scsi0 and the PM3334UDW is scsi1.

The problems we are having now appeared suddenly but have been
persistent since that moment.  We have had a number of kernel panics
that seem to surround the EATA_DMA driver "locking slots" and then
freezing.  It appears to be related to the the 
PM2144UW card and disks as during times of the panic, the only
activity was on the scsi0 bus.  Likewise, we seem to possibly have a
hardware problem on the RAID unit as the alarm was sounding and it show
SCSI ID0 drive with a problem.  Since the handle for DPT Storage Manager
is not there, we had to shutdown and boot DOS to get in and look at the
drives.  It showed a drive fail on the ID0 disk.  We had been seeing the
following errors too:

Oct 31 13:31:07 imageweb kernel: Returning: SCSI_ABORT_BUSY

On the scsi1 (PM3334UDW) channel.  The array was in degraded mode
because of whatever triggered the alarm.  What is really weird about
this is that we started DIAGSs and the ID0, ID2, and ID3 disk showed
failures!  On ID2 we got a message in the Storage Manager event log
about a "Bus Parity Error" (YIPES!).  And ID3 had a "parameter
mismatch".  So now, the whole array was FAILED.  Well, we powered it all
off and on and then did a scan on the RAID array as a whole, it checked
out.  Issued a rebuild and it rebuilt.  ID3 still was out to lunch, but
we could sort of run.  Still got the SCSI_ABORT_BUSY messages so I
assumed that maybe this was due to the degraded array performance and
timing with the PM3334UDW (but not sure).  

I went back the past week and ran a full DIAG again on the ID0 disk and
it past with flying colors.  I "zapped" the ID3 disk and it seemed to
rebuild this time.  I wanted to run DIAGs on the other disks , but, not
being able to run Storage Manager from linux is a terrible situation for
us as we live 2 hours from this medical facility.  This is one of the
reasons for this message.  

Quoted herein is a message from you, Mike, from last August.  I am
inquiring as the status of getting it so that we can run Storage
Manager.  I expressed the same to DPT in VERY clear terms the other day
as well.  With the advent of all this new validation of Linux, this is
become an ever more crucial issue!  I realize that this addition to the
EATA_DMA driver takes time and that (as with any Open Source like
project) these types of things often have to be worked in around work
and family (believe me - I KNOW), etc.  Of course, this is one of the
down sides to Open Source in general.  But, where are we actually on
getting to the point of being able to have a native Storage Manager for
Linux such that I can continue to hope to use DPT solutions?  Being that
it is Open Source, might there be a way (with the help of others if need
be) to plot and follow a course of action to that end?  I would be
willing to work on it but I must say I am far from the level of driver
programmer that you appear to be Mike.  Is there something we could do
as a group???  I expressed to DPT that we take Linux has a viable
platform VERY seriously.  That this project in particular was worth
quite a bit of money as well (look at our hardware config - it AIN'T
cheap!).  I just felt I must ask.

The other reason for this post was to see if anyone might have a clue as
to why the hardware is behaving this way and what things might be
causing it to be so?  What is really being stated by the driver with
SCSI_ABORT_BUSY error on the PM3334UDW channel?  What are the
circumstances surrounding a "locking slot" message from the EATA_DMA
driver just prior to and part of the kernel panic?  Is there any
possible problems having BOTH a PM3334UDW and a PM2144UW controller in
the same box?  Would running WINE to allow one to possibly run the Win
3.11 or Win95 Storage Manager be of any use?  What is the current and
future status of the EATA_DMA driver and related tools?  What other
solutions for very large disk arrays like this one are there that are
relatively well proven?  I am at wits end.  I will say that this did
seem to work fine for 8 weeks prior to this so I am convinced something
in hardware is gone bad.  But it is very frustrating (as I am 2 hours
away from the unit) to not be able to run Storage Manager in native
linux.  And I am not the only one who is likely feeling this.  I think
the work Mike has done to date is outstanding!  And for the most part,
prior to these problems, it has been a no-brainer.  But, where I am now
is not good and is endangering the desired outcome of the project and I
have to do something soon.  So I make my case to all of you and I thank
you for listening.

Mark

-----------------------

here is Mike's post from 8/97:

> Re: RAID-Controller                                               fewer options
> 
> Author:    Michael Neuffer                                       author profile
> Email:     [EMAIL PROTECTED]                          view thread
> Date:      1997/08/06                                               email reply
> Forums:    comp.os.linux.setup, comp.os.linux.misc                     post new
> more headers                                                          subscribe
>                                                                         view for
>                                                                     bookmarking
>                                                                       text only
>   ------------------------------------------------------------------------
> 
> Michel Bardiaux ([EMAIL PROTECTED]) wrote:
> : Darla Baker wrote:
> : > I am setting up a Compaq Proliant 2500 server at work and I am using a
> : > DPT SmartRAID controller (I don't remember the specific model off hand)
> : > set up to control 4 4 Gig hard drives in 2 RAID-1 pairs.  It is working
> : > great with Linux!  In fact, when I ran the DPT setup program, what did I
> : > find under the OS section but a little penguin sitting on top of the
> : > word "LINUX"!  It was great.  I said to my boss, "Linux has arrived!"
> :
> : Could you *please* post the DPT exact model, and the name of the DPT
> : setup program and its version? We have SmartCacheIV controllers
> : (specifically PM2044UW) but without the cache/RAID option. I wanted to
> : use the DPT "Storage Manager" (version 1) (aka DPTMGR) to run
> : diagnostics, and found no Linux support. I mailed [EMAIL PROTECTED] and
> : they answered that there was no Linux version *yet* but it was being
> : done. So, either they do not know their own software (possible, since
> : they did not mention the SmartRAID Linux support), or it is a different
> : soft for the SmartRAID.
> 
> No, the code is indeed under development. I _hope_ to have the time to
> finish a preliminary version of the driver side interface over the next
> weekend. With that it should be possible to run the SCO StorageManager
> with iBCS support. After that the native Storage Manager port will be
> on my shedule.
> 
> I'll put some info and packages on my EATA homepages at
> http://www.uni-mainz.de/~neuffer/scsi/dpt
> 
> The mailing list will also soon be up again and have a new home.
> 
> : We are considering adding the cache/RAID option to transform a 4x9GB
> : farm into a RAID-5 plus 1 hot-spare maybe. (But if there is truly no
> : Linux support for the SmartCache and there is for the SmartRAID, it
> : might be better to purchase SMartRAID and keep the other boards for
> : Wintel PCs).
> 
> If you want to run RAID-5, you should definitely use a SmartRAID
> controller otherwise the performance will suffer.
> 
> Mike
>

Reply via email to