6.1 and ATA problems

2006-06-29 Thread Jaime Bozza
Hello,

It seems that I'm having problems with 6.1 (6-STABLE) and ATA.  This
particular system is an ASUS P3V4X, which is a VIA (Apollo Pro) chipset.
I installed 6.0 (From CD) on the system just fine, then I built a
gmirror using ad0 and ad2.  I then proceeded to update to 6-STABLE.

One I rebooted, ad2 just disappeared.  I tried all sorts of things.
If I removed all devices except ad2, the system would boot the kernel
and then when it went to mount root it wouldn't find any drives.  If I
move ad2 to ad1 (slave on ata0), the drive shows up fine.  I currently
have my two mirrored drives on ad0/ad1 and I put a cdrom drive on ata1.
ata1 is detected just fine but nothing is probed on that channel.

Here's my dmesg:

FreeBSD 6.1-STABLE #0: Thu Jun 29 13:37:27 CDT 2006
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/NEPTUNE
Timecounter i8254 frequency 1193182 Hz quality 0
CPU: Intel Pentium III (752.83-MHz 686-class CPU)
  Origin = GenuineIntel  Id = 0x683  Stepping = 3
 
Features=0x383f9ffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,C
MOV,PAT,PSE36,MMX,FXSR,SSE
real memory  = 1073725440 (1023 MB)
avail memory = 1045999616 (997 MB)
kbd1 at kbdmux0
acpi0: ASUS P3V_4X on motherboard
acpi0: Power Button (fixed)
Timecounter ACPI-fast frequency 3579545 Hz quality 1000
acpi_timer0: 24-bit timer at 3.579545MHz port 0xe408-0xe40b on acpi0
cpu0: ACPI CPU on acpi0
acpi_throttle0: ACPI CPU Throttling on cpu0
acpi_button0: Power Button on acpi0
pcib0: ACPI Host-PCI bridge port 0xcf8-0xcff on acpi0
pci0: ACPI PCI bus on pcib0
agp0: VIA 82C691 (Apollo Pro) host to PCI bridge mem
0xe400-0xe7ff at device 0.0 on pci0
pcib1: PCI-PCI bridge at device 1.0 on pci0
pci1: PCI bus on pcib1
pci1: display, VGA at device 0.0 (no driver attached)
isab0: PCI-ISA bridge at device 4.0 on pci0
isa0: ISA bus on isab0
atapci0: VIA 82C596B UDMA66 controller port
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xd800-0xd80f at device 4.1 on pci0
ata0: ATA channel 0 on atapci0
ata1: ATA channel 1 on atapci0
pci0: serial bus, USB at device 4.2 (no driver attached)
pci0: bridge, HOST-PCI at device 4.3 (no driver attached)
rl0: Accton MPX 5030/5038 10/100BaseTX port 0xd000-0xd0ff mem
0xe080-0xe08000ff at device 9.0 on pci0
miibus0: MII bus on rl0
rlphy0: RealTek internal media interface on miibus0
rlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
rl0: Ethernet address: 00:e0:29:54:ae:da
pci0: multimedia, audio at device 11.0 (no driver attached)
fdc0: floppy drive controller port 0x3f2-0x3f5,0x3f7 irq 6 drq 2 on
acpi0
fdc0: [FAST]
fd0: 1440-KB 3.5 drive on fdc0 drive 0
ppc0: ECP parallel printer port port 0x378-0x37f,0x778-0x77b irq 7 drq
3 on acpi0
ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/9 bytes threshold
ppbus0: Parallel port bus on ppc0
lpt0: Printer on ppbus0
lpt0: Interrupt-driven port
ppi0: Parallel I/O on ppbus0
sio0: 16550A-compatible COM port port 0x3f8-0x3ff irq 4 flags 0x10 on
acpi0
sio0: type 16550A
sio1: 16550A-compatible COM port port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
atkbdc0: Keyboard controller (i8042) port 0x60,0x64 irq 1 on acpi0
atkbd0: AT Keyboard irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
pmtimer0 on isa0
orm0: ISA Option ROM at iomem 0xc-0xc7fff on isa0
sc0: System console at flags 0x100 on isa0
sc0: VGA 16 virtual consoles, flags=0x300
vga0: Generic ISA VGA at port 0x3c0-0x3df iomem 0xa-0xb on
isa0
Timecounter TSC frequency 752826072 Hz quality 800
Timecounters tick every 1.000 msec
ipfw2 (+ipv6) initialized, divert loadable, rule-based forwarding
disabled, default to deny, logging disabled
ad0: 9787MB WDC AC310200R 17.01J17 at ata0-master UDMA66
ad1: 9787MB WDC WD102AA 80.10A80 at ata0-slave UDMA66
GEOM_MIRROR: Device gm0 created (id=291540166).
GEOM_MIRROR: Device gm0: provider ad0 detected.
GEOM_MIRROR: Device gm0: provider ad1 detected.
GEOM_MIRROR: Device gm0: provider ad0 activated.
GEOM_MIRROR: Device gm0: provider mirror/gm0 launched.
GEOM_MIRROR: Device gm0: rebuilding provider ad1.
Trying to mount root from ufs:/dev/mirror/gm0a
Accounting enabled


Basically, no matter what device I put on ata1, it disappears until 6.1.
When booting under 6.0, everything shows up fine.

I'd be happy to forward additional information as needed.  Please let me
know.


Thanks,


Jaime Bozza

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: 6-STABLE oddity

2006-11-13 Thread Jaime Bozza
 Download, burn to CD and run http://www.memtest86.com/
 
 Usually problems of this sort are faulty ram.
 I had a buddy getting odd errors on copying files that happenned at
 random.
 Turned out to be bad ram too.

I recently had this same problem with a recent 6-STABLE and thought the
same thing.  Ran memtest for over 48 hours and never came up with any
errors.  I would cvsup source and run an md5 check to compare with
another known good system and seemed to always have 1-2 files bad.  It
seemed to always be just 1 bit off.

Tried swapping cables, cards (SCSI), etc.  The system was running
gmirror on two 18G SCSI drives using an Adaptec controller.  If I
disabled the 2nd drive, I didn't have a single problem after a ton of
testing.

Turned out that I hadn't formatted the 2nd drive using the Adaptec
tools.  The drives had been out of service for about 3 years.  Once I
went through a format/verify I wasn't able to duplicate the problem no
matter what I tried.

So, RAM is definitely the easiest thing to test but keep in mind that
there are other areas that may also cause an issue.


Jaime Bozza

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: What is a good choice of sata-ii raid controller for freebsd?

2007-02-08 Thread Jaime Bozza
 They kick ass is what they are like. :)
 
 I had a 3U box with a 12 port controller sitting next to my desk for a
 few weeks and my only goal was to confuse/break the 3Ware controller.

 No amount of power plug pulling, pulling multiple drives, quickly
 re-arranging drives could confuse the controller.  Made the SCSI stuff
 we use look like absolute neurotic junk.

Forgot that note - Areca handles drive location changes the same way.
I assume this is handled by metadata on the drive.

 -They added a moving part (2-wire fan, no tach) to a
mission-critical
 part.  That seems real stupid.  After the bearings die in 2-3 years,
 what happens to your card?  Does it melt or just start acting weird?
 If the engineers didn't consider that, what other failure modes did 
 their limited creativity miss? :)

Strange.  Our 1160 has a fan, but also had just a heatsink (no fan) that
was in the box.  My 1261ML was heatsink only.  I believe someone asked
for the feature.  Both controllers monitor the fan and would notify you
if the fan died.  You can turn the option on or off (off by default)
if you need to.

 -Availability.  None of our normal dealers could get them.

Availability seems to be a bit better now, but I can't answer for your
dealers.

 -Not many people seemed to be using them, so less feedback available
 and the whole package (hardware/firmware/driver) has less exposure
than
 3Ware.

While the 9xxx series seem to be great (use a different driver), the
twe cards caused me so much grief that I was afraid to try them out.
We had a bunch of corruption issues (in RAID 5) with our 75xx
controllers that I was never able to fix.  RAID 0 or 1 seems to work
just fine for them.

 -3Ware answered pre-sales questions, Areca didn't.

Perhaps they've changed?   I spent a good hour on the phone with a tech
before we purchased our first controller.  This was last year sometime.

 Performance and feature-wise the Areca and 3Ware seemed pretty close,
 so we went with 3Ware.

Everyone has their reasons - I liked the RAID 6 feature, plus the OOB
management of Areca, plus my history with 3ware wasn't good. :(


Jaime Bozza
Qlinks Media Group

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: What is a good choice of sata-ii raid controller for freebsd?

2007-02-08 Thread Jaime Bozza
 would be to set aside one port that
you don't use, and create a passthrough for that port to run your tests.
Running additional tests and leaving an array in a degraded state is
(again, my opinion) asking for trouble.  

Hope that helps!


Jaime Bozza
Qlinks Media Group

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: What is a good choice of sata-ii raid controller for freebsd?

2007-02-08 Thread Jaime Bozza
 For what it's worth, 3Ware's latest PCI-E cards (9650 series) now
 support RAID 6.  The updated twa driver that supports them hasn't yet
 been merged into FreeBSD (see kern/106488 which I filed 2 months ago)
 but you can download either the source or the binary for it from 3Ware
 that works just fine.  The updated 3dm2 for it did make it into the
 Ports tree.

I noticed that when I was making my reply from earlier.  At some point
in time I may test out the newer (twa-based) cards.  It seems that 3ware
is actually interested in supporting those.  They never managed to get
the twe driver out of what they called beta, so that's where my
experience ended.

Interesting thing - When I installed FreeBSD on the new system (pre-6.2
version discs) it didn't see the new Areca card.  I started to get
worried but quickly found out that the driver had been updated shortly
before 6.2 was released. Quick update and all was well.  So the kernel
driver *is* being updated.

 Driver annoyances aside, my 9650SE is considerably faster than my
9500S
 (both have batteries, both have the drive's write cache off),
especially
 on writes, and they are both much faster than my Adaptec 2410SA (which
 has no battery option and thus needed write caching disabled).

I haven't tested the current system in speeds, but it's noticeably
faster than my other (1160 PCI-X) system.  Possibly due to the fact that
I was using WD4000YR drives in the first which do not support the
updated speeds of SATA2.  The current system does and the card detects
it.  The newer card also has an updated processor - Intel IOP341 instead
of an IOP33x series processor which I would guess makes a big
difference.  At least according to synthetic benchmarks going around the
net. :)

 I've never tried Areca.  I would probably like them, from the sound of
 things.  I'm sticking with 3Ware for SATA systems for now though...
but
 hey, personal preference and all.  For my next SCSI/SAS system I may
 have to do some serious evaluation of what's new out there...

Regardless of what you choose, there are a few decent options for
FreeBSD now.  Shows that people out there are actually starting to care
about FreeBSD for some of these controllers.

Unfortunately, I didn't have the same solid results with the mirroring
(RAID1) support for onboard SATA (Supermicro X7DVL-E).  I mirrored two
SATA2 drives (OS) and tried pulling one drive.  Placing it back in
didn't automatically start a rebuild, but I could force it.  Pulling it,
restarting, and then removing the drive from the RAID1 array (via the
BIOS) would then cause FreeBSD to panic at boot (or drive insertion)
every time.  I don't know if it was the fact that there was still
metadata on the drive so the ataraid driver thought something
differently, but it bothered me a little.  I had to disable ATA RAID in
the BIOS completely (and remove the ataraid device from the kernel) to
get FreeBSD to boot or allow the drive back into the system at all.
Since the motherboard RAID is just software RAID, I switched over to
gmirror but kept AHCI on so I'd still have hotplug support.  After that
I wasn't able to kill the system.


Jaime Bozza
Qlinks Media Group



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: What is a good choice of sata-ii raid controller for freebsd?

2007-02-09 Thread Jaime Bozza
 Hardware RAID1 buys you nothing in perfomance and reliability
 for a prolonged headache with drivers, bios insanity and
 monitoring+control tools.


Intel does seem to have a few hardware-based RAID controllers here:
http://www.intel.com/products/server/raid/

I don't see any driver or support for them in FreeBSD though.


Jaime Bozza
Qlinks Media Group

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

RE: What is a good choice of sata-ii raid controller for freebsd?

2007-02-12 Thread Jaime Bozza
 Jaime, can you expand a bit about what sort-of motherboard you
 installed the 1261ML in?  I've yet to find any mainstream motherboards
 which have a PCIe x8 slot.  Most have x1, some have x4, and many
 have x16.  I've seen one Supermicro board which has a x8 slot but
 is only wired for x4 (has 4 lanes).

I'm using a Supermicro X7DVL-E board, which has 1 PCI-e x8 and 1 PCI-e
x4 (in an x8 slot).

 Based on what I've read, you can install a x8 card in a x16 slot as
 long as the x16 slot is wired (physically) for 8 or more lanes.

You probably could, but I'm not sure that would work correctly in a
non-server board.  I think any server board that has a x16 slot would be
fine since more servers aren't really designed for using high-end
graphics.

 My concern is that these x16 slots on motherboards are being primarily
 used for video cards, thus I ultimately have no idea what the
 manufacturers are testing them with.  Most manufacturer documentation
 I've seen says for use with graphics applications.  I'll add that
 I've only seen x1 and x16 cards until now -- the Areca cards are the
 first card I've seen using x8.

Looking at Supermicro's site, it seems that pretty much all of the 5000V
and 5000P based motherboards have at least one x8 slot.  The 5000X based
boards have an x16 slot, but those are workstation boards which would
likely use that slot for a graphics card.

Look at Supermicro's site under Xeon 5300/5100/5000 motherboards.  You
should be able to find a board that would work just fine for you.


Jaime Bozza
Qlinks Media Group

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: What is a good choice of sata-ii raid controller for freebsd?

2007-02-13 Thread Jaime Bozza
  Interesting - Someone else mentioned the same thing.  The amr(4)
  manpage doesn't seem to be updated to mention the latest cards
  though.  I did notice the driver hasn't been really updated in a
  while either.  Wouldn't this cause a problem with identifying the
  newer cards?
 
 
 The authoritative source is the source itself:
grep amr_device_ids /usr/src/sys/dev/amr/amr_pci.c


True - But without having a card to check the ids, it doesn't help all that 
much.  After a bunch of downloading WinXP drivers to look up vendor ids, it 
seems that the FreeBSD driver does not support any of the PCI Express boards 
(or Server boards) at this point in time.


Jaime Bozza
Qlinks Media Group
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

RE: 4.x can't read 5.x dump?

2004-12-02 Thread Jaime Bozza
There's no theoretical reason why the formats used by dump and
restore shouldn't be forward and backward compatible, allowing
an older restore (to an older filesystem type) to pick out the
parts of the dump which make sense to it while ignoring parts
which it doesn't understand.

But they aren't, so it can't, so you're out of luck.

This is a pretty interesting issue that I didn't realize.  I've
regularly
restored dumps from a Solaris 8 machine to my FreeBSD 4.x machines.  (We
had data volumes on some old Solaris machines that I replaced with
FreeBSD.)  I guess FreeBSD and Solaris volumes are similar enough that
the restore just worked.


Jaime Bozza

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: well-supported SATA RAID card?

2006-03-13 Thread Jaime Bozza
Anyone care to comment on Areca's ARC-11xx PCI-X cards? I'm 
?thinking about getting an 1130 (12-port version).

We just installed an ARC-1160 so I'll try and answer as many of your
questions that I can.

*Is the arcmsr driver in FreeBSD stable?

I've had no issues.  

*Any issues with arrays larger then 2TB?

I think Areca does things a little differently than some of the other
cards (someone correct me if I'm wrong on this.) Basically you setup a
RAID set, which is just a set of disks.  Then you setup volumes in that
RAID set.  The volumes are where you define RAID level and Ch/Id/Lun for
access under an OS.  

The cards can handle volumes  2TB without a problem and supports both
ways (Windows and LBA) of handling the volumes.   We currently have a
3.6TB volume (11 400GB drives under RAID 6) available under FreeBSD
6-STABLE with no real problems other than the gotchas that are known
basically.  Here's the page on FreeBSD's website about that:

http://www.freebsd.org/projects/bigdisk/index.html


*Rebuild times?

Can't give you an exact since it's been a while since I tested the
original rebuild, but we've migrated the RAID set (and volume) twice
since getting the system and the migrations happened within hours.  I
was able to expand the RAID Set (adding drives) and expand the
corresponding volume set to fill the drives all while the system was
running without a hitch.

*Command Line management software?

Haven't played with the CLI much yet, but it seems to handle every
command you would need to send to the card.

*Is the company BSD friendly, no binary blob object in the driver?

Latest driver was built right into the kernel.  Updates on Areca's
website are in source form.

*Competent tech support?

I've only used their support when I originally got an 8 port card.  They
were very helpful in answering my questions to realize I needed the
16-port to do what I wanted.

*What does the ethernet port on the ARC-1130 do?

Out of Band management (telnet and HTTP) directly to the card.  The 1130
and above all have the Ethernet and will also do email notifications
directly without OS intervention.  Eliminates the need to run a daemon
under FreeBSD.  We've found the HTTP management daemon under FreeBSD to
have some problems (Core dumps occasionally), but once we started using
the Ethernet port we didn't need to worry about that.

I'm primarily interested in this card because it can
do RAID level 6 and based on the benchmarks I've seen 
it's a top performer.

Everything has been smooth with the Areca and we are using RAID 6
without any issues.  I haven't done major performance tests myself so I
can't give you any hard numbers, but we've been very pleased with the
system.  

The problems I had were some initial corruption on our large volumes at
the beginning do to a crash (my fault) with a softupdates volume.  When
trying to fsck the partition it told me I needed over 2.5GB of RAM to
fsck the partition.  Researching the problem came out with an answer of
filesystem corruption that would be fixed easily, so I just reworked our
partitions so that I had multiple smaller partitions and removed
softupdates for now.  I've crashed the system a few times since then
and fsck worked just fine.

Any other questions you have, feel free to ask.


Jaime Bozza

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: well-supported SATA RAID card?

2006-03-16 Thread Jaime Bozza
*Rebuild times?

Can't give you an exact since it's been a while since I tested the
original rebuild, but we've migrated the RAID set (and volume) twice
since getting the system and the migrations happened within hours.  I
was able to expand the RAID Set (adding drives) and expand the
corresponding volume set to fill the drives all while the system was
running without a hitch.

So you increased the size of a file-system on-the-fly?

Not a file-system but a volume.  I'm partitioning the volume into 800GB
chunks for this particular situation.  We just did it for the last time,
so I have some numbers.

Previous Configuration:
  11 WD4000YR 400GB drives
  RAID 6
  3600GB volume
  4 800GB partitions (using gpt)
  Remaining 400GB unused

Added: 5 WD4000YR 400GB drives
Time to Expand RAID set: 12 hours
Time to Expand Volume: 56 minutes

New Volume:
  RAID 6
  5600GB
  7 800GB partitions

During the RAID Set Expansion, the Areca fills out the Volume from the
11 drives to the 16 drives, so it's a lot of writing.  It basically
rewrote all 3600GB of existing data, which accounts for the 12 hours.
Expanding the Volume initializes the extra space and once it's done
FreeBSD sees the new larger volume.  Areca doesn't touch the first
part of the volume when expanding it, so existing data isn't destroyed.
Of course, if you modified a volume set to make it smaller, you're
mostly out of luck.

I didn't have to reboot during any of this process.  The most I had to
do was unmount the 4 existing volumes so that I had write access to the
volume (gpt doesn't allow write access when partitions are mounted),
then run gpt recover to recover the secondary partition table at the end
of the volume.

After that, it was just a simple matter of adding the 3 new partitions
and mounting them.

The above Time to Expand Volume was actually generating RAID 6 parity
data for the additional 2 terabytes, so that should give a good idea on
the speed of the XOR engine.  This was at the maximum of 80% utilization
for the background process.  I suspect it would have been a little
quicker if I restarted and used the BIOS menu to expand (since it would
have been a foreground process), but it's nice to be able to keep the
system in used while I was running the processes.

Jaime Bozza

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


DVD Burners/USB

2006-05-05 Thread Jaime Bozza
Hello,

I recently replaced a DVD Automated Burner unit with a newer model.  The
older model used firewire for the drive interface and dvd+rw-tools
worked fine.

The new model uses USB 2.0 ports (1 for each drive.)  Unfortunately, it
seems that dvd+rw-tools no longer seems to work.  Running
dvd+rw-mediainfo returns a GET CONFIGURATION error.  cdrecord seems to
work fine for burning CDs, but I don't currently have a license for the
DVD code.

The system is running FreeBSD 6.0 Stable from around February.

I was just curious if anyone was aware of some sort of issue with the
usb/umass interface that would not allow dvd+rw-mediainfo to access the
drive correctly when the firewire/sbp interface would.


Thanks,

Jaime Bozza

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


ZFS Boot/7.2-STABLE

2009-09-10 Thread Jaime Bozza
After much work and looking at all the different configurations, I have a 
running 7.2-STABLE (amd64) system running ZFS only with no partitions.   
Unfortunately, this required a couple of things.

First - The patch here: 
http://lists.freebsd.org/pipermail/freebsd-stable/2009-June/050518.html

(just the patch to nfs.c - the other one is already in the tree)

Next, I re-enabled ZFS support in /sys/boot/i386/Makefile

Finally, I rebuilt and installed libstand, then rebuilt and installed the boot 
loader.   Once I had those pieces, my ZFS-only system booted fine.  

I've read the problems that caused the ZFS support to be backed out.  Could ZFS 
Boot (loader) support be something optional that isn't a default?   This would 
allow those who want to use ZFS Boot the ability without needing to move to 8.0 
(which still isn't out officially).

I'd have no problems rebuilding world to enable ZFS Boot on a new system, but 
currently I'd have to remember to re-patch libstand and the loader Makefile 
each time I updated, or move to 8.0, which I'm not quite ready to do.   

At the very least, can someone MFC the libstand patch?  The link above is 
basically the diff between 7.x and 8.x for the nfs.c file.

Jaime Bozza

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Possible scheduler (SCHED_ULE) bug?

2009-10-23 Thread Jaime Bozza
I believe I found a problem with the ULE scheduler - At least the fact that 
there is a problem, but I'm not sure where to go from here.   The system locks 
all processes, but doesn't panic, so I have no output to give.  

I was able to duplicate this on three different machines and solved it by 
switching to the scheduler to 4BSD.

Here's the environment:

FreeBSD 7.2 i386, installed from bootonly ISO, Custom install, minimal, no 
other changes other than setting timezone, changing root password, and turning 
on sshd (allowing root and password connection).

Running portsnap (fetch, then extract) to get latest ports tree.

From ports, make installs of lang/php5 and www/lighttpd, using defaults for 
all ports installed.

Modified lighttpd.conf for PHP (attached diff), created a short script called 
uploadfile.php (attached).  File was installed at 
/usr/local/www/data/uploadfile.php

Start lighttpd (lighttpd_enable=YES in rc.conf, /usr/local/etc/rc.d/lighttpd 
start), connect and run script.

As long as I upload a file less than 64K, everything works fine.  If I try to 
upload something larger than 64K, system no longer responds.   Console prompt 
at login will allow me to enter username/password, but nothing happens after 
that.  Console prompt logged in will allow me to type a single line, but if I 
press enter, nothing after that.

No errors get written anywhere - console, logs, etc.

I'm at a loss of what to do next.  Can anyone give me ideas of what else I can 
do?







--- lighttpd.conf.sample2009-10-23 09:37:50.0 -0500
+++ lighttpd.conf   2009-10-23 10:02:00.0 -0500
@@ -20,7 +20,7 @@
 #   mod_auth,
 #   mod_status,
 #   mod_setenv,
-#   mod_fastcgi,
+mod_fastcgi,
 #   mod_proxy,
 #   mod_simple_vhost,
 #   mod_evhost,
@@ -39,7 +39,7 @@
 server.document-root= /usr/local/www/data/
 
 ## where to send error-messages to
-server.errorlog = /var/log/lighttpd.error.log
+server.errorlog = /tmp/lighttpd.error.log
 
 # files to check for if .../ is requested
 index-file.names= ( index.php, index.html,
@@ -115,7 +115,7 @@
 # server.tag = lighttpd
 
  accesslog module
-accesslog.filename  = /var/log/lighttpd.access.log
+accesslog.filename  = /tmp/lighttpd.access.log
 
 ## deny access the file-extensions
 #
@@ -324,3 +324,20 @@
 # Enable IPV6 and IPV4 together
 server.use-ipv6 = enable
 $SERVER[socket] == 0.0.0.0:80 { }
+
+  fastcgi.server = ( .php = ((
+bin-path = /usr/local/bin/php-cgi,
+socket = /tmp/hermes.php.socket, 
+min-procs = 1,
+max-procs = 1,
+bin-environment = (
+  PHP_FCGI_CHILDREN = 4,
+  PHP_FCGI_MAX_REQUESTS = 50,
+  PHPRC = /data/sites/support/conf/
+  ),
+bin-copy-environment = (
+  PATH, SHELL, USER, TZ
+  ),
+broken-scriptfilename = enable,
+  )))
+
\attachment: uploadfile.php
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

RE: Possible scheduler (SCHED_ULE) bug?

2009-10-23 Thread Jaime Bozza
 Try adding this or changing these items in lighttpd.conf:
 
 ## FreeBSD!
 server.event-handler  = freebsd-kqueue
 server.network-backend= writev

Scott,

Lighttpd was already using freebsd-kqueue, but I added the writev 
network-backend and the problem went away.   With this additional information I 
was able to track down kern/138999, which seems to be the exact issue I'm 
having.

The additional information I have (over the PR) is that:
1) Files over 64K cause the problem, not just larger files
2) switching over to SCHED_4BSD eliminates the problem - system no longer 
locks.  
3) 7.2 amd64 doesn't have the problem - Tested a similar configuration and was 
not able to duplicate on amd64 at all.

I'm CC'ing the original submitter of the PR to give him an update to see if he 
had any additional luck.

Jaime 





___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


RE: Possible scheduler (SCHED_ULE) bug?

2009-10-24 Thread Jaime Bozza
  The additional information I have (over the PR) is that:
  1) Files over 64K cause the problem, not just larger files
 I thought it was over 1 MB or so. But maybe I'm wrong. ISTR that I
 couldn't trigger it with some images of around 70K.

I discovered it originally with a 72K file.  After some tests, I found a 63K 
file worked and a 65K file didn't.  When I get back into the office, I can test 
the actual boundary (65535, 65536, 65537, etc), but 64K seems pretty logical.

  2) switching over to SCHED_4BSD eliminates the problem - system no
 longer locks.
 I will have to test this. This is indeed interesting...
 
  3) 7.2 amd64 doesn't have the problem - Tested a similar
 configuration and was not able to duplicate on amd64 at all.
 I can replicate this problem on FreeBSD 7.2/amd64 reliably.

I haven't tried larger files - Maybe the boundary is different on amd64?   
Doing some quick tests right now, I was able to upload a 100MB file without a 
problem, but this is an AMD64 system with SMP, plus the filesystem is all ZFS, 
so there are too many things different.  I'll have to setup a system that 
closely mirrors the rest of my tests (UFS, ULE, no SMP, etc) before I can say 
I'm not having a problem there.

Jaime

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


RE: Possible scheduler (SCHED_ULE) bug?

2009-10-26 Thread Jaime Bozza
From: Jacob Myers [mailto:ja...@whotookspaz.org]
 Arnaud Houdelette wrote:
  I had the same issue using 7.1 amd64, with ZFS, no SMP.
  Not really sure what is the size boundary. I can't really test
 either,
  as the machine is remote.
  But I confirm that each tentative upload of certain relatively 'big'
  files (around 1MB) with wordpress hanged the system before I switched
  from sendfile to writev.
 
  I might do some test on amd64 7.2 with no SMP if it can be of any use
 ?
 
  Arnaud
 
 I can confirm it happens without SMP on 7.2 and amd64. If you can give
 it a try though, well, the more information the better. Any boundary
 information, even approximate (well, mostly testing if 64K is the
 boundary or if 1 MB or so is) would probably be good, too.

I haven't tested the specific boundaries yet, but I will do that shortly.

I *was* able to get a crash dump on the i386 system - Will post the details 
shortly.

My amd64 system is a test system with ZFS, so I couldn't get a crash dump.   
Trying to work around that.

On both systems, I used a 72K file (73,688 bytes) to test.  Both systems would 
lock up, and then a few seconds later kdb would come up.   It wasn't an 
immediate thing, at least not on the i386 system.  I wasn't able to watch the 
amd64 system since it's too far away to time.

Jaime

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


RE: Possible scheduler (SCHED_ULE) bug?

2009-10-26 Thread Jaime Bozza


Sincerely,

Jaime Bozza
MindSites Group, LLC


From: Dylan Cochran [mailto:heliocent...@gmail.com]
 Superficially, this seams identical to a deadlock I reported for
 7.1-RC1. Would you mind compiling a kernel with these options:
 
 snip
 KDB: stack backtrace:
 db_trace_self_wrapper(c0b55b52,e66e0ae0,c07615e9,c0b50617,8ca93,...)
 at db_trace_self_wrapper+0x26
 kdb_backtrace(c0b50617,8ca93,0,c41a7690,2,...) at kdb_backtrace+0x29
 hardclock(0,c07ff29d,0,0,4,...) at hardclock+0x1f9
 lapic_handle_timer(e66e0b08) at lapic_handle_timer+0x9c
 Xtimerint() at Xtimerint+0x1f
 --- interrupt, eip = 0xc07ff29d, esp = 0xe66e0b48, ebp = 0xe66e0c34 ---
 kern_sendfile(c41a7690,e66e0cfc,0,0,0,...) at kern_sendfile+0x90d
 do_sendfile(e66e0d2c,c0aba265,c41a7690,e66e0cfc,20,...) at
 do_sendfile+0xb1
 sendfile(c41a7690,e66e0cfc,20,16,e66e0d2c,...) at sendfile+0x13
 syscall(e66e0d38) at syscall+0x335
 Xint0x80_syscall() at Xint0x80_syscall+0x20
 --- syscall (393, FreeBSD ELF32, sendfile), eip = 0x282cb0cb, esp =
 0xbfbfc7cc, ebp = 0xbfbfe848 ---
 KDB: enter: watchdog timeout
 
 You can type 'reboot' to reboot the machine (in my case, panic would
 not work, so a useful dump wasn't in the cards)

Different offset on mine, but of course I'm using a different kernel.  
kern_sendfile+0x6ad
do_sendfile+0xb1
sendfile+0x13

Luckily, I was able to get a panic, so I have all the files necessary to debug. 
 Here's the backtrace:

(kgdb) backtrace
#0  doadump () at pcpu.h:196
#1  0xc07f2c57 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
#2  0xc07f2f62 in panic (fmt=Variable fmt is not available.
) at /usr/src/sys/kern/kern_shutdown.c:574
#3  0xc0497e47 in db_panic (addr=Could not find the frame base for db_panic.
) at /usr/src/sys/ddb/db_command.c:446
#4  0xc04985bc in db_command (last_cmdp=0xc0ca9154, cmd_table=0x0, dopager=1) 
at /usr/src/sys/ddb/db_command.c:413
#5  0xc04986ca in db_command_loop () at /usr/src/sys/ddb/db_command.c:466
#6  0xc049a17d in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:228
#7  0xc081fdf6 in kdb_trap (type=3, code=0, tf=0xc72e2a5c) at 
/usr/src/sys/kern/subr_kdb.c:524
#8  0xc0b01b9b in trap (frame=0xc72e2a5c) at /usr/src/sys/i386/i386/trap.c:692
#9  0xc0ae58fb in calltrap () at /usr/src/sys/i386/i386/exception.s:166
#10 0xc081ff7a in kdb_enter_why (why=0xc0b677b2 watchdog, msg=0xc0b7ef1d 
watchdog timeout) at cpufunc.h:60
#11 0xc07b0cad in hardclock (usermode=0, pc=3229966301) at 
/usr/src/sys/kern/kern_clock.c:640
#12 0xc0aedf1c in lapic_handle_timer (frame=0xc72e2afc) at 
/usr/src/sys/i386/i386/local_apic.c:785
#13 0xc0ae5edf in Xtimerint () at apic_vector.s:108
#14 0xc0855fdd in kern_sendfile (td=0xc771db40, uap=0xc72e2cfc, hdr_uio=0x0, 
trl_uio=0x0, compat=0) at atomic.h:160
#15 0xc0856d31 in do_sendfile (td=0xc771db40, uap=0xc72e2cfc, compat=0) at 
/usr/src/sys/kern/uipc_syscalls.c:1775
#16 0xc0856dd3 in sendfile (td=0xc771db40, uap=0xc72e2cfc) at 
/usr/src/sys/kern/uipc_syscalls.c:1746
#17 0xc0b01365 in syscall (frame=0xc72e2d38) at 
/usr/src/sys/i386/i386/trap.c:1094
#18 0xc0ae5960 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:262
#19 0x0033 in ?? ()
Previous frame inner to this frame (corrupt stack?)

This is all a bit new to me (debugging, etc), so let me know if you need 
anything else!

Jaime

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


RE: Possible scheduler (SCHED_ULE) bug?

2009-10-26 Thread Jaime Bozza
From: Kostik Belousov [mailto:kostik...@gmail.com]
 Can you look up the source line for kern_sendfile+0x90d in your
 kernel ? Do kgdb kernel.debug, then execute list *(kern_sendfile+0x90d).

In my case, it was kern_sendfile+0x6ad (rebuilt with RELENG_7 this weekend).

Here's the output:

(kgdb) list *(kern_sendfile+0x6ad)
0xc0855fdd is in kern_sendfile (atomic.h:160).
155 static __inline int
156 atomic_cmpset_int(volatile u_int *dst, u_int exp, u_int src)
157 {
158 u_char res;
159
160 __asm __volatile(
161 MPLOCKED 
162cmpxchgl %2,%1 ;
163sete%0 ;
164 1: 

Not much to go on there.  I posted a backtrace in a previous email, but the 
relevant sections (I think) are:

#14 0xc0855fdd in kern_sendfile (td=0xc771db40, uap=0xc72e2cfc, hdr_uio=0x0, 
trl_uio=0x0, compat=0) at atomic.h:160
#15 0xc0856d31 in do_sendfile (td=0xc771db40, uap=0xc72e2cfc, compat=0) at 
/usr/src/sys/kern/uipc_syscalls.c:1775
#16 0xc0856dd3 in sendfile (td=0xc771db40, uap=0xc72e2cfc) at 
/usr/src/sys/kern/uipc_syscalls.c:1746
#17 0xc0b01365 in syscall (frame=0xc72e2d38) at 
/usr/src/sys/i386/i386/trap.c:1094
#18 0xc0ae5960 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:262
#19 0x0033 in ?? ()
Previous frame inner to this frame (corrupt stack?)

I'm still going to test the specific boundary, but if there's more information 
I can give, let me know!

Jaime


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


RE: Possible scheduler (SCHED_ULE) bug?

2009-10-26 Thread Jaime Bozza
From: Arnaud Houdelette [mailto:arnaud.houdele...@tzim.net]
 I haven't tried larger files - Maybe the boundary is different on amd64?   
 Doing some quick tests
 right now, I was able to upload a 100MB file without a problem, but this is 
 an AMD64 system with SMP,
 plus the filesystem is all ZFS, so there are too many things different.  I'll 
 have to setup a system
 that closely mirrors the rest of my tests (UFS, ULE, no SMP, etc) before I 
 can say I'm not having a
 problem there.
 
  Jaime
 
 I had the same issue using 7.1 amd64, with ZFS, no SMP.
 Not really sure what is the size boundary. I can't really test either,
 as the machine is remote.
 But I confirm that each tentative upload of certain relatively 'big'
 files (around 1MB) with wordpress hanged the system before I switched
 from sendfile to writev.
 
 I might do some test on amd64 7.2 with no SMP if it can be of any use ?
 
 Arnaud

I was able to duplicate the problem on 7.2-STABLE amd64 no SMP - Problem didn't 
seem to happen with SMP on.  While I wasn't able to get a crash dump, the crash 
looked similar.

Jaime

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


RE: ATA Atapi 4.6 Release

2002-06-18 Thread Jaime Bozza

While I haven't had a specific problem with FreeBSD (Most of our servers
don't even have CDROM drives), I thought I would share something that
may be relevant.  Recently, we purchased two system containing Benq
(formally Acer) 56X CDROM drives.  While installing Windows 2000 on both
systems, the installation died at trying to read the driver.cab file.
For those who don't know, the file is about 50MB and by far the largest
file that gets copied over during the W2K installation process.

Both systems would not retry the file at all and I had to restart the
install.  Using the same hardware but switching out the CDROM to an
older (32X) drive, the installation worked fine.  I attempted the
installation process about 3-4 times with the older drive and it worked
perfectly every time.  With the 56X drive, I wouldn't *always* have a
problem, but it would frequently die at the driver.cab file copy.

Once in Windows, switching the drive to DMA mode, I've attempted copying
the files again, as well as other CDs that have big files and cannot
duplicate it.  In fact, the drives have been working fine every since.
If you'd like, I could switch the drive back out to PIO mode and see if
I have timeout errors again.

Could it be possible that these drives need some sort of workaround when
in PIO mode?


Jaime Bozza


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]] On Behalf Of Bruce A. Mah
Sent: Monday, June 17, 2002 11:33 PM
To: John Prince
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED];
[EMAIL PROTECTED]
Subject: Re: ATA Atapi 4.6 Release 


If memory serves me right, John Prince wrote:
 Thanks..
 and you are right..

This worked?  OK.

I'll work on an errata entry for this.  Here's my understanding, mostly
based on dougb's message plus a little experimenting:

This problem is that some users are having difficulty reading CD-ROMs on
certain ATA CD-ROM drives.  This has been seen on the AOpen 48x, 52x,
and 56x CD-ROM drives (are there any non-AOpen drives that have this
problem?).  The error message one sees is:

acd0: READ_BIG command timeout - resetting
ata1: resetting devices .. done

(In some cases, this is followed by a kernel panic.)

We need first-time CD-ROM users (with affected hardware) to interrupt
their boot process (at the Hit [Enter] to boot immediately... step),
and then do:

ok set hw.ata.ata_dma=1
ok set hw.ata.atapi_dma=1
ok boot

(This gets a working system for a CD-ROM install.)

Do the install like normal.

After installation is complete, we need to apply the workaround for
subsequent reboots.  Reboot after finishing the installation and then
add these lines to /boot/loader.conf or /boot/loader.conf.local (create
as necessary):

hw.ata.ata_dma=1
hw.ata.atapi_dma=1

Then reboot one more time.

Is that roughly right?  I'm mostly interested in getting an errata entry
containing the correct workaround out quickly, rather than getting the
list of CDROM drives exactly right.  We can fine-tune said list later.

Bruce.





To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-stable in the body of the message



RE: Possible scheduler (SCHED_ULE) bug?

2009-12-16 Thread Jaime Bozza
 From: Jacob Myers [mailto:ja...@whotookspaz.org]
 Jaime Bozza wrote:
  From: Arnaud Houdelette [mailto:arnaud.houdele...@tzim.net]
  I haven't tried larger files - Maybe the boundary is different on amd64?   
  Doing some quick tests
  right now, I was able to upload a 100MB file without a problem, but this 
  is an AMD64 system with
 SMP,
  plus the filesystem is all ZFS, so there are too many things different.  
  I'll have to setup a
 system
  that closely mirrors the rest of my tests (UFS, ULE, no SMP, etc) before I 
  can say I'm not having a
  problem there.
  Jaime
 
  I had the same issue using 7.1 amd64, with ZFS, no SMP.
  Not really sure what is the size boundary. I can't really test either,
  as the machine is remote.
  But I confirm that each tentative upload of certain relatively 'big'
  files (around 1MB) with wordpress hanged the system before I switched
  from sendfile to writev.
 
  I might do some test on amd64 7.2 with no SMP if it can be of any use ?
 
  Arnaud
 
  I was able to duplicate the problem on 7.2-STABLE amd64 no SMP - Problem 
  didn't seem to happen with
 SMP on.  While I wasn't able to get a crash dump, the crash looked similar.
 
  Jaime
 
 
 FWIW, there was a fix committed for this:
 http://svn.freebsd.org/viewvc/base?view=revisionrevision=198853
 See if it helps.

Sorry for the delay in testing this - Everything seems to be working fine now.  
 I'm not able to force a lockup anymore under the same conditions.   Thanks for 
the fix!

Jaime

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org