from:"Ulrich Spoerlein"

Re: amd(8) cores dump when load high

2008-12-25 Thread Ulrich Spoerlein

On Tue, 23.12.2008 at 00:44:53 +0800, Lin Jui-Nan Eric wrote:
 Dear listers,
 
 We currently found that amd frequently cores dump while loading is
 high (about 4~5) after we upgrade world  kernel from 7.0-RELEASE to
 7.1-PRERELEASE.
 
 I have read -stable and svn log of 7-STABLE, but can not found a
 report or a solution. Did anyone have the same issue? Thank you very
 much.

Ever since I switched from file-based NSS to LDAP, amd(8) has been
crashing on me almost every day, especially if there's no LDAP server
available during boot (ie. the laptop is not on the home network).

It looks like the error handling in NSS requests could be improved, but
I've yet to investigate the whole matter. Load plays no role in amd(8)
crashing (at least for me).

Cheers,
Ulrich Spoerlein
-- 
It is better to remain silent and be thought a fool,
than to speak, and remove all doubt.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

moused(8) ate my umass(4) devices, it's true!

2008-12-15 Thread Ulrich Spoerlein

Hey all,

I've observed a very weird behaviour with my USB stick for quite a while
now (probably 4 months; sadly, I don't have any dates handy). Anyway, I
have this weird SUN Keyboard - USB adapter, which offers an ukbd(4) and
ums(4) device to the system, although there is no mouse attached to the
Sun keyboard I'm using.

ukbd0: vendor 0x0430 PS/2 KB  MS, class 0/0, rev 1.00/0.04, addr 3 on uhub4
kbd2 at ukbd0
ums0: vendor 0x0430 PS/2 KB  MS, class 0/0, rev 1.00/0.04, addr 3 on uhub4
ums0: 3 buttons.

This worked fine on RELENG_7 till somewhere around summer. Now, whenever
there is a moused(8) listening on this fake ums(4) port, the umass(4)
device will get stuck somewhere in CAM-land. It probes fine:

Dec 14 10:24:49 roadrunner kernel: umass0: Samsung YP-U2, class 0/0, rev 
2.00/10.01, addr 6 on uhub4

but then only BBB bulk transfer timeout messages follow every so often.
The da0 device never shows up.

Dec 14 10:26:59 roadrunner kernel: umass0: BBB reset failed, TIMEOUT
Dec 14 10:27:04 roadrunner kernel: umass0: BBB bulk-in clear stall failed, 
IOERROR
Dec 14 10:27:04 roadrunner kernel: umass0: BBB bulk-out clear stall failed, 
IOERROR
Dec 14 10:28:09 roadrunner kernel: umass0: BBB reset failed, IOERROR
Dec 14 10:28:09 roadrunner kernel: umass0: BBB bulk-in clear stall failed, 
IOERROR
Dec 14 10:28:09 roadrunner kernel: umass0: BBB bulk-out clear stall failed, 
IOERROR
Dec 14 10:29:14 roadrunner kernel: umass0: BBB reset failed, IOERROR
Dec 14 10:29:14 roadrunner kernel: umass0: BBB bulk-in clear stall failed, 
IOERROR
Dec 14 10:29:14 roadrunner kernel: umass0: BBB bulk-out clear stall failed, 
IOERROR
Dec 14 10:30:19 roadrunner kernel: umass0: BBB reset failed, IOERROR
Dec 14 10:30:19 roadrunner kernel: umass0: BBB bulk-in clear stall failed, 
IOERROR
Dec 14 10:30:19 roadrunner kernel: umass0: BBB bulk-out clear stall failed, 
IOERROR
Dec 14 10:31:24 roadrunner kernel: umass0: BBB reset failed, IOERROR
Dec 14 10:31:24 roadrunner kernel: umass0: BBB bulk-in clear stall failed, 
IOERROR
Dec 14 10:31:24 roadrunner kernel: umass0: BBB bulk-out clear stall failed, 
IOERROR

I cannot unplug the USB stick (instant panic) and kldunloading umass is
also bad (instant panic). I have to reboot the system and remove the
device then.

Today, I figured out that it depends wholly on moused(8) running on that
unpopulated mouse port. Killing the moused process, which will start
automatically when ums0 attaches, before plugging in the umass device and
everybody is happy. I'm glad I found this workaround, but the situation
sucks anyway.

Other than binary searching the offending commit, what debugging could I
do? Would a ktrace of the moused(8) be helpful when attaching umass? Is
it perhaps polling the port too often waiting for a mouse to appear?

Also, can I somehow blacklist the mouse-port of this adapter? I do not
intend to use a 3 button Sun mouse, ever.

Cheers,
Ulrich Spoerlein
-- 
It is better to remain silent and be thought a fool,
than to speak, and remove all doubt.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: LORs in RELENG_7

2008-11-21 Thread Ulrich Spoerlein

On Thu, 20.11.2008 at 17:56:07 -0500, Michael Proto wrote:
 On Thu, Nov 20, 2008 at 4:11 PM, Ulrich Spoerlein [EMAIL PROTECTED]wrote:
  Hi,
 
  I'm running my RELENG_7 kernel with WITNESS and there's always a LOR
  when pf(4) is enabled:
 
  lock order reversal:
   1st 0xc09ca828 ifnet (ifnet) @ /usr/src/sys/net/if.c:849
   2nd 0xc45d604c pf task mtx (pf task mtx) @
  /usr/src/sys/modules/pf/../../contrib/pf/net/pf_if.c:916
  KDB: stack backtrace:
  db_trace_self_wrapper(c08df797,fb671764,c0630e8e,c08e1c96,c45d604c,...) at
  db_trace_self_wrapper+0x26
  kdb_backtrace(c08e1c96,c45d604c,c45d3b1c,c45d3b1c,c45d379e,...) at
  kdb_backtrace+0x29
  witness_checkorder(c45d604c,9,c45d379e,394,c08e9058,...) at
  witness_checkorder+0x6de
  _mtx_lock_flags(c45d604c,0,c45d379e,394,fb6717dc,...) at
  _mtx_lock_flags+0xbc
  pfi_attach_group_event(0,c445,c08e9058,374,c44a920c,...) at
  pfi_attach_group_event+0x4e
  if_addgroup(c441c000,c08f70d6,4,0,0,...) at if_addgroup+0x2c7
  if_clone_createif(0,0,c08e9563,87,0,...) at if_clone_createif+0x81
  if_clone_create(fb671943,4,0,44,180,...) at if_clone_create+0x8c
  tunclone(0,c461e400,fb671943,4,fb67195c,...) at tunclone+0x17a
  devfs_lookup(fb6719d0,fb6719d0,fb671b7c,c418de04,2,...) at
  devfs_lookup+0x50e
  VOP_LOOKUP_APV(c0928f40,fb6719d0,c412f230,c08e77ef,2a9,...) at
  VOP_LOOKUP_APV+0xa5
  lookup(fb671b7c,c08e77ef,c6,bf,c461e92c,...) at lookup+0x58e
  namei(fb671b7c,c412f230,fb671a74,246,c0983774,...) at namei+0x34b
  vn_open_cred(fb671b7c,fb671c78,ce8,c461e400,c4460558,...) at
  vn_open_cred+0x2c9
  vn_open(fb671b7c,fb671c78,ce8,c4460558,c05e807d,...) at vn_open+0x33
  kern_open(c412f230,80a0f18,0,3,808ecfa,...) at kern_open+0xe7
  open(c412f230,fb671cfc,c,c08e28c3,c092c0b8,...) at open+0x30
  syscall(fb671d38) at syscall+0x2b3
  Xint0x80_syscall() at Xint0x80_syscall+0x20
  --- syscall (5, FreeBSD ELF32, open), eip = 0x2835a65b, esp = 0xbfbfeafc,
  ebp = 0xbfbfeb38 ---
 
 
 Are you using user or group rules in your pf.conf? IIRC there is still a
 known LOR in the socket layer with rules using the user or group filters.

No, I'm aware of the problems with pf(4) and user/group rules. This LOR
is in combination with rules on tun(4) devices, as you can see from the
backtrace. I wonder what tunclone() is doing in there, though.

Cheers,
Ulrich Spoerlein
-- 
It is better to remain silent and be thought a fool,
than to speak, and remove all doubt.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: LORs in RELENG_7

2008-11-21 Thread Ulrich Spoerlein

if xpt_async() is calling into uma (as it obviously does).

sys/dev/firewire/sbp.c:
2202 if (sdev-path) {
2203 SBP_LOCK(sdev-target-sbp);
2204 xpt_release_devq(sdev-path,
2205  sdev-freeze, TRUE);
2206 sdev-freeze = 0;
2207 xpt_async(AC_LOST_DEVICE, sdev-path, NULL);
2208 xpt_free_path(sdev-path);
2209 sdev-path = NULL;
2210 SBP_UNLOCK(sdev-target-sbp);
2211 }


Cheers,
Ulrich Spoerlein
-- 
It is better to remain silent and be thought a fool,
than to speak, and remove all doubt.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

LORs in RELENG_7

2008-11-20 Thread Ulrich Spoerlein

+0x10
passcleanup(c42c6700,c08b50c7,c09eff00,4,c08db41b,...) at passcleanup+0x2e
camperiphfree(c42c6700,0,f93a96b0,c04568dd,c42c6700,...) at camperiphfree+0xbb
cam_periph_invalidate(c42c6700,c0983774,f93a96e4,c046a5ea,c42c6700,...) at 
cam_periph_invalidate+0x3e
cam_periph_async(c42c6700,100,c418a250,0,f93a96e0,...) at cam_periph_async+0x2d
passasync(c42c6700,100,c418a250,0,c42f8a00,...) at passasync+0xca
xpt_async_bcast(0,4,c08b53c5,11a5,c404d280,...) at xpt_async_bcast+0x32
xpt_async(100,c418a250,0,89b,0,...) at xpt_async+0x194
sbp_cam_detach_sdev(c402f4c8,0,c402f484,1,f93a982c,...) at 
sbp_cam_detach_sdev+0xa4
sbp_cam_detach_target(c14729a8,c14729a8,c08250c6,c44263f0,10,...) at 
sbp_cam_detach_target+0x5b
sbp_post_explore(c402f400,f93a9ce8,f93a9ce4,675,0,...) at sbp_post_explore+0xa2
fw_bus_probe_thread(c404f000,f93a9d38,c08d8d0f,31c,c402b570,...) at 
fw_bus_probe_thread+0x69b
fork_exit(c0513500,c404f000,f93a9d38) at fork_exit+0xb8
fork_trampoline() at fork_trampoline+0x8
--- trap 0, eip = 0, esp = 0xf93a9d70, ebp = 0 ---
(da1:sbp0:0:1:0): lost device
(da1:sbp0:0:1:0): removing device entry

I reckon these problems should appear in -STABLE ...

Cheers,
Ulrich Spoerlein
-- 
It is better to remain silent and be thought a fool,
than to speak, and remove all doubt.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Any working ichsmb(4) platforms out there?

2008-09-13 Thread Ulrich Spoerlein

On Thu, 11.09.2008 at 15:14:52 +0100, Bruce M Simpson wrote:
 Does anyone have ichsmb(4) actually seeing SMBus devices?
 e.g. you run smbmsg -p on your FreeBSD-STABLE system and see something.
 
 I just tried it again on my IBM ThinkPad T43 and saw nothing, all I get is:
 ichsmb0: device timeout, status=0x41
 ...in dmesg.

No luck with an ICH5, here:

ichsmb0: Intel 82801EB (ICH5) SMBus controller port 0x2400-0x241f irq 17 at 
device 31.3 on pci0
ichsmb0: [GIANT-LOCKED]
smbus0: System Management Bus on ichsmb0
smb0: SMBus generic I/O on smbus0
ichsmb0: device timeout, status=0x41
ichsmb0: device timeout, status=0x41
ichsmb0: device timeout, status=0x41
ichsmb0: device timeout, status=0x41
...

# uname -rsm
FreeBSD 6.3-STABLE i386
# devinfo -v|grep smb
ichsmb0 pnpinfo vendor=0x8086 device=0x24d3 subvendor=0x1734 
subdevice=0x101c class=0x0c0500 at slot=31 function=3 handle=\_SB_.PCI0.PM__
# kenv|grep smb
smbios.bios.reldate=11/25/2004
smbios.bios.vendor=FUJITSU SIEMENS // Phoenix Technologies Ltd.
smbios.bios.version=5.00 R2.14.1534.01  
smbios.chassis.maker=FUJITSU SIEMENS
smbios.chassis.serial=YBFC445826  
smbios.chassis.tag=
smbios.chassis.version=SCEE 
smbios.planar.maker=FUJITSU SIEMENS
smbios.planar.product=D1534
smbios.planar.serial=
smbios.planar.version=S26361-D1534
smbios.socket.enabled=1
smbios.socket.populated=1
smbios.system.maker=FUJITSU SIEMENS
smbios.system.product=SCENIC E
smbios.system.serial=YBFC445826  
smbios.system.uuid=93D4A7A3-705F-11D9-8688-00300577E7A0
smbios.system.version=


Cheers,
Ulrich Spoerlein
-- 
It is better to remain silent and be thought a fool,
than to speak, and remove all doubt.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: cpufreq(4) panic on RELENG_7 (was: Re: Call for bfe(4) testers.)

2008-08-09 Thread Ulrich Spoerlein

Hi John,

I now figured out the who, the why still eludes me.

So, after your MFC of ichss.c on June 27th the device now attaches at my
laptop. It didn't before, so it could cause no trouble.

With ichss loaded, the kernel will panic 1-3 minutes after powerd has
been started (if I kill powerd early enough, it seems pretty stable).

I'm now running a kernel from 2008-08-08 with
hint.ichss.0.disabled=1

Applying your patch to kern_cpu.c does not help though. I'll be happy to
try further patches to make ichss behave well, although I'll never use
it for this laptop, as EST is the only technique useful on this old
Pentium-M.

  Will also disable p4tcc. This was not attaching during the RELENG_6
  times but leads to ridiculous rates of 75 MHz.
 
 If p4tcc attaching is new, that might point to the culprit.  A good quick 
 test 
 would be to disable individual cpufreq drivers to find out which one causes 
 the panic.

p4tcc attaching was new relative to RELENG_6, not relative to my
working 7.x kernel of 2008-06-13.

Cheers,
Ulrich Spoerlein
-- 
It is better to remain silent and be thought a fool,
than to speak, and remove all doubt.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Problem with /boot/loader [A new patch]

2008-08-09 Thread Ulrich Spoerlein

On Sat, 09.08.2008 at 17:22:01 +0800, Eugene Grosbein wrote:
On Fri, Aug 08, 2008 at 12:49:28PM -0400, John Baldwin wrote:

My realization this morning is that software interrupts ('int X') in real
mode
disable interrupts just like hardware interrupts do. Thus, my patch
changes
BTX to disable interrupts for both cases 1) and 2) now. I think this will
fix the hangs. I'm still including the code to explicitly initialize the
eflags for user requests to a known-good value. It still has interrupts
enabled which means that case 3) should know always run with interrupts
enabled (which is the desired state), but the client can disable interrupts
in the eflags in the vm86 structure if desired.

The updated patch (same URL, new patch) is at
http://www.FreeBSD.org/~jhb/patches/btx_hang.patch

Sigh, it does not fix my problem described here:

http://groups.google.ru/group/muc.lists.freebsd.stable/browse_thread/thread/538039f40b469e2a

I've just updated my 7.0-STABLE to latest sources, applied your patch
using cd /usr/src; patch -p6 ~/btx_hang.patch, it has applied cleanly.
Then I've rebuilt and reinstalled kernel and world and rebooted.
My problem persists as it was.

I'm not sure about which piece of code you are talking here (boot0,
boot1, boot2, loader?) But if it's one of the former, you dont need to
installworld, but install new boot blocks using either fdisk -B or
bsdlabel -B (or both).

hth,
Ulrich Spoerlein
--
It is better to remain silent and be thought a fool,
than to speak, and remove all doubt.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

What is cryptosoft0?

2008-08-06 Thread Ulrich Spoerlein

Hi,

today I discovered the following dmesg line on my laptop:

cryptosoft0: software crypto on motherboard

and I've not seen this one before, so: what is cryptosoft and should I
care?

I could imagine it's a pseudo-device by crypto(9) so the API is the same
whether crypto hardware is installed or not.

Anyway, I think a manpage link/update would be in order:

% man -k cryptosoft
cryptosoft: nothing appropriate


Cheers,
Ulrich Spoerlein
-- 
It is better to remain silent and be thought a fool,
than to speak, and remove all doubt.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

cpufreq(4) panic on RELENG_7 (was: Re: Call for bfe(4) testers.)

2008-08-06 Thread Ulrich Spoerlein

On Mon, 04.08.2008 at 16:07:55 -0400, John Baldwin wrote:
 On Monday 04 August 2008 02:29:19 pm Ulrich Spoerlein wrote:
  Fatal trap 12: page fault while in kernel mode
  cpuid = 0; apic id = 00
  fault virtual address   = 0x38
  fault code  = supervisor read, page not present
  instruction pointer = 0x20:0xc058ec16
  stack pointer   = 0x28:0xfb8b8ac8
  frame pointer   = 0x28:0xfb8b8ac8
  code segment= base 0x0, limit 0xf, type 0x1b
  = DPL 0, pres 1, def32 1, gran 1
  processor eflags= interrupt enabled, resume, IOPL = 0
  current process = 1176 (powerd)
  db:0:kdb.enter.default  show pcpu
  cpuid= 0
  curthread= 0xc4ec0aa0: pid 1176 powerd
  curpcb   = 0xfb8b8d90
  fpcurthread  = none
  idlethread   = 0xc3f80cc0: pid 10 idle: cpu0
  APIC ID  = 0
  currentldt   = 0x50
  db:0:kdb.enter.default  bt
  Tracing pid 1176 tid 100103 td 0xc4ec0aa0
  device_is_attached(0,c87e6b40,fb8b8afc,0,101,...) at device_is_attached+0x6
  cf_set_method(c420b600,c87e6b40,64,fb8b8ba4,c87e33b4,...) at 
 cf_set_method+0x6a3
  cpufreq_curr_sysctl(c420d840,c4207000,0,fb8b8ba4,fb8b8ba4,...) at 
 cpufreq_curr_sysctl+0x232
  sysctl_root(fb8b8ba4,4,1,c4ec0aa0,c4501d38,...) at sysctl_root+0x137
  userland_sysctl(c4ec0aa0,fb8b8c14,4,0,0,...) at userland_sysctl+0x151
  __sysctl(c4ec0aa0,fb8b8cfc,18,fb8b8ca0,46,...) at __sysctl+0xec
  syscall(fb8b8d38) at syscall+0x345
  Xint0x80_syscall() at Xint0x80_syscall+0x20
  --- syscall (202, FreeBSD ELF32, __sysctl), eip = 0x28161bd3, esp = 
 0xbfbfe8cc, ebp = 0xbfbfe8f8 ---
  db:0:kdb.enter.default  capture off
  
  Seems like I caught RELENG_7 during a bad time. Will update again.
 
 What cpufreq drivers do you have loaded and attached?  This patch might work 
 around the issue, but I suspect there is a bug in one of the cpufreq drivers.

Hi John,

sorry for the slow update, please bear with me.

This is on a first generation Pentium-M (Banias core) with EST (and also
p4tcc attached, as I just discovered):

CPU: Intel(R) Pentium(R) M processor 1.50GHz (1495.15-MHz 686-class CPU)
  Origin = GenuineIntel  Id = 0x6d6  Stepping = 6
  
Features=0xafe9f9bfFPU,VME,DE,PSE,TSC,MSR,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,TM,PBE
  Features2=0x180EST,TM2
..
cpu0: ACPI CPU on acpi0
est0: Enhanced SpeedStep Frequency Control on cpu0
p4tcc0: CPU Frequency Thermal Control on cpu0


dev.cpu.0.%desc: ACPI CPU
dev.cpu.0.%driver: cpu
dev.cpu.0.%location: handle=\_PR_.CPU0
dev.cpu.0.%pnpinfo: _HID=none _UID=0
dev.cpu.0.%parent: acpi0
dev.cpu.0.freq: 300
dev.cpu.0.freq_levels: 1500/-1 1312/-1 1200/-1 1050/-1 1000/-1 875/-1 800/-1 
700/-1 600/-1 525/-1 450/-1 375/-1 300/-1 225/-1 150/-1 75/-1
dev.cpu.0.cx_supported: C1/1 C2/1 C3/85 C4/185
dev.cpu.0.cx_lowest: C1
dev.cpu.0.cx_usage: 100.00% 0.00% 0.00% 0.00%
dev.acpi_perf.0.%parent: cpu0
dev.est.0.%desc: Enhanced SpeedStep Frequency Control
dev.est.0.%driver: est
dev.est.0.%parent: cpu0
dev.est.0.freq_settings: 1500/-1 1200/-1 1000/-1 800/-1 600/-1
dev.cpufreq.0.%driver: cpufreq
dev.cpufreq.0.%parent: cpu0
dev.p4tcc.0.%desc: CPU Frequency Thermal Control
dev.p4tcc.0.%driver: p4tcc
dev.p4tcc.0.%parent: cpu0
dev.p4tcc.0.freq_settings: 1/-1 8750/-1 7500/-1 6250/-1 5000/-1 3750/-1 
2500/-1 1250/-1

a kernel from 2008-06-13 is the last known working one. I just had the
same crash with a kernel from sources at 2008-07-01 and am new
recompiling for 2008-06-24.

Your MFC of est.c rev 180044 might be the problem, I'll try a backout
once I confirmed that the 2008-06-24 kernel is running stable.

 Index: kern_cpu.c
 ===
 RCS file: /usr/cvs/src/sys/kern/kern_cpu.c,v
 retrieving revision 1.27.2.2
 diff -u -r1.27.2.2 kern_cpu.c
 --- kern_cpu.c  9 May 2008 19:02:10 -   1.27.2.2
 +++ kern_cpu.c  4 Aug 2008 20:07:41 -
 @@ -329,6 +329,8 @@
 /* Next, set any/all relative frequencies via their drivers. */
 for (i = 0; i  level-rel_count; i++) {
 set = level-rel_set[i];
 +   if (set-dev == NULL)
 +   continue;
 if (!device_is_attached(set-dev)) {
 error = ENXIO;
 goto out;
 

Will try that one too, hopefully tomorrow.

Will also disable p4tcc. This was not attaching during the RELENG_6
times but leads to ridiculous rates of 75 MHz.

Cheers,
Ulrich Spoerlein
-- 
It is better to remain silent and be thought a fool,
than to speak, and remove all doubt.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: ddb(4) scripts not working in RELENG_7?

2008-08-04 Thread Ulrich Spoerlein

Hi Robert,

On Sun, 03.08.2008 at 14:49:00 +0100, Robert Watson wrote:
 On Sun, 3 Aug 2008, Ulrich Spoerlein wrote:
  I was testing a patch and getting a panic (page fault while in kernel mode) 
  in RELENG_7 running multiuser mode, but no scripts were automagically run, 
  although I configured ddb_enable=YES in rc.conf.
 
  It simply dropped me to the interactive ddb(4) prompt, nothing more. Do you 
  have any idea what I could be missing?
 
 I have been using DDB scripts on 7-STABLE without any problems, but I'm not 
 sure I've tried it with a page fault, just regular panics.  Could you try 
 entering the debugger via sysctl debug.kdb.panic=1, which forces a panic, 
 and see if your scripts run then?  Perhaps there's some inconsistency in how 
 we're entering the debugger.  If things still appear not to be happening, try 
 setting up a kdb.enter.default script and see if that works?

Spot on! Entering via sysctl works as expected; the 'default' script
will also be executed after a page fault, but not the panic-script.

So either page faults should call the panic-script or some sort of
kdb.enter.pfault should be introduced? Either way, I see another manpage
update coming up :)

Cheers,
Ulrich Spoerlein
-- 
It is better to remain silent and be thought a fool,
than to speak, and remove all doubt.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

ddb(4) scripts not working in RELENG_7?

2008-08-03 Thread Ulrich Spoerlein

Hi Robert,

I was testing a patch and getting a panic (page fault while in kernel
mode) in RELENG_7 running multiuser mode, but no scripts were
automagically run, although I configured ddb_enable=YES in rc.conf.

It simply dropped me to the interactive ddb(4) prompt, nothing more. Do
you have any idea what I could be missing?

Btw, you might wanna update the ddb(8) manpage's History section, as the
feature seems to first appear in 7.1 :)

% egrep ddb|dump /etc/rc.conf
dumpdev=/dev/ad0s3
ddb_enable=YES
% sysctl debug.ddb.scripting.scripts
debug.ddb.scripting.scripts: lockinfo=show locks; show alllocks; show 
lockedvnods
kdb.enter.panic=textdump set; capture on; run lockinfo; show pcpu; bt; ps; 
alltrace; capture off; call doadump; reset
kdb.enter.witness=run lockinfo


Cheers,
Ulrich Spoerlein
-- 
It is better to remain silent and be thought a fool,
than to speak, and remove all doubt.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

RELENG_6 regression: ums0: X report 0x0002 not supported

2008-05-14 Thread Ulrich Spoerlein

Hi,

after updating an Intel S5000PAL system from 6.2 to 6.3, ums(4) is no
longer attaching correctly.

Here's an dmesg diff between 6.2 and 6.3

 uhub3: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
 uhub3: 2 ports with 2 removable, self powered
 ehci0: EHCI (generic) USB 2.0 controller mem 0xe8d0-0xe8d003ff
irq 23 at device 29.7 on pci0
 ehci0: [GIANT-LOCKED]
 usb4: EHCI version 1.0
 usb4: companion controllers, 2 ports each: usb0 usb1 usb2 usb3
 usb4: EHCI (generic) USB 2.0 controller on ehci0
 usb4: USB revision 2.0
 uhub4: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
 uhub4: 8 ports with 8 removable, self powered
 ukbd0: Avocent Avocent Embedded DVC 1.0, rev 2.00/0.00, addr 2, iclass 3/1
 kbd2 at ukbd0
 ums0: Avocent Avocent Embedded DVC 1.0, rev 2.00/0.00, addr 2, iclass 3/1
-ums0: 3 buttons and Z dir.
-uhid0: Avocent Avocent Embedded DVC 1.0, rev 2.00/0.00, addr 2, iclass 3/1
-uhid0: could not read endpoint descriptor
-device_attach: uhid0 attach returned 6
+ums0: X report 0x0002 not supported
+device_attach: ums0 attach returned 6

Attached is the full 6.3 dmesg. Looks weird to me, anything I can try
on this hardware?

Uli


dmesg.boot
Description: Binary data
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

RELENG_6 regression: panic: vm_fault on nofault entry, addr: c8000000

2008-05-14 Thread Ulrich Spoerlein

Hi,

there's a regression going from 6.2 to 6.3, where it will panic upon
booting the kernel within vm_fault. This problem has been discussed
before, but I'm seeing it reliably on a RELENG_6 checkout from 5th of
May.

It affects multiple (but identical) systems, here's an verbose boot
leading to the panic. Please note that 6.2 was running fine on these
machines, they also boot normally if I disable ACPI (but this is not
really an option).

SMAP type=01 base= len=0009d800
SMAP type=02 base=0009d800 len=2800
SMAP type=02 base=000ce000 len=2000
SMAP type=02 base=000e4000 len=0001c000
SMAP type=01 base=0010 len=cfe6
SMAP type=03 base=cff6 len=9000
SMAP type=04 base=cff69000 len=00017000
SMAP type=02 base=cff8 len=0008
SMAP type=02 base=e000 len=1000
SMAP type=02 base=fec0 len=0001
SMAP type=02 base=fee0 len=1000
SMAP type=02 base=ff00 len=0100
SMAP type=01 base=0001 len=3000
786432K of memory above 4GB ignored
Copyright (c) 1992-2008 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 6.3-20080505-SNAP #0: Mon May  5 11:42:32 UTC 2008
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC
Preloaded elf kernel /boot/kernel/kernel at 0xc1051000.
Preloaded mfs_root /boot/mfsroot at 0xc10511e8.
Preloaded elf module /boot/modules/acpi.ko at 0xc105122c.
MP Configuration Table version 1.4 found at 0xc009dd71
Table 'FACP' at 0xcff68e48
Table 'APIC' at 0xcff68ebc
MADT: Found table at 0xcff68ebc
APIC: Using the MADT enumerator.
MADT: Found CPU APIC ID 0 ACPI ID 0: enabled
MADT: Found CPU APIC ID 4 ACPI ID 1: enabled
MADT: Found CPU APIC ID 2 ACPI ID 2: enabled
MADT: Found CPU APIC ID 6 ACPI ID 3: enabled
ACPI APIC Table: PTLTD  APIC  
Calibrating clock(s) ... i8254 clock: 1193204 Hz
CLK_USE_I8254_CALIBRATION not specified - using default frequency
Timecounter i8254 frequency 1193182 Hz quality 0
Calibrating TSC clock ... TSC clock: 3000122064 Hz
CPU: Intel(R) Xeon(TM) CPU 3.00GHz (3000.12-MHz 686-class CPU)
  Origin = GenuineIntel  Id = 0xf64  Stepping = 4
  
Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
  Features2=0xe4bdSSE3,RSVD2,MON,DS_CPL,VMX,EST,CNXT-ID,CX16,xTPR,PDCM
  AMD Features=0x2010NX,LM
  AMD Features2=0x1LAHF
  Cores per package: 2
  Logical CPUs per core: 2
real memory  = 3489005568 (3327 MB)
Physical memory chunk(s):
0x1000 - 0x0009cfff, 638976 bytes (156 pages)
0x0010 - 0x003f, 3145728 bytes (768 pages)
0x01425000 - 0xcc488fff, 3406184448 bytes (831588 pages)
avail memory = 3405979648 (3248 MB)
bios32: Found BIOS32 Service Directory header at 0xc00f5960
bios32: Entry = 0xfd520 (c00fd520)  Rev = 0  Len = 1
pcibios: PCI BIOS entry at 0xfd520+0x247
pnpbios: Found PnP BIOS data at 0xc00f59e0
pnpbios: Entry = f:af28  Rev = 1.0
Other BIOS signatures found:
APIC: CPU 0 has ACPI ID 0
MADT: Found IO APIC ID 8, Interrupt 0 at 0xfec0
ioapic0: Routing external 8259A's - intpin 0
MADT: Found IO APIC ID 9, Interrupt 24 at 0xfec8
lapic0: Routing NMI - LINT1
lapic0: LINT1 trigger: edge
lapic0: LINT1 polarity: high
lapic4: Routing NMI - LINT1
lapic4: LINT1 trigger: edge
lapic4: LINT1 polarity: high
lapic2: Routing NMI - LINT1
lapic2: LINT1 trigger: edge
lapic2: LINT1 polarity: high
lapic6: Routing NMI - LINT1
lapic6: LINT1 trigger: edge
lapic6: LINT1 polarity: high
MADT: Interrupt override: source 0, irq 2
ioapic0: Routing IRQ 0 - intpin 2
MADT: Interrupt override: source 9, irq 9
ioapic0: intpin 9 trigger: level
ioapic0 Version 2.0 irqs 0-23 on motherboard
ioapic1 Version 2.0 irqs 24-47 on motherboard
cpu0 BSP:
 ID: 0x   VER: 0x00050014 LDR: 0xff00 DFR: 0x
  lint0: 0x00010700 lint1: 0x0400 TPR: 0x SVR: 0x01ff
  timer: 0x000100ef therm: 0x0200 err: 0x0001 pcm: 0x0001
ath_rate: version 1.2 SampleRate bit-rate selection algorithm
wlan: 802.11 Link Layer
null: null device, zero device
random: entropy source, Software, Yarrow
nfslock: pseudo-device
io: I/O
kbd: new array size 4
kbd1 at kbdmux0
mem: memory
Pentium Pro MTRR support enabled
ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
rr232x: RocketRAID 232x controller driver v1.02 (May  5 2008 11:42:16)
hptrr: HPT RocketRAID controller driver v1.1 (May  5 2008 11:42:14)
npx0: INT 16 interface
acpi0: PTLTD   RSDT on motherboard
ioapic0: routing intpin 9 (ISA IRQ 9) to vector 48
acpi0: [MPSAFE]
pci_open(1):mode 1 addr port (0x0cf8) is 0x80008058

Re: $HOME changed from 6.2 to 6.3 and 7.0 ?!

2008-03-01 Thread Ulrich Spoerlein

On Fri, 29.02.2008 at 13:58:27 -0800, Jeremy Chadwick wrote:
 On Fri, Feb 29, 2008 at 10:07:23PM +0100, Ulrich Spoerlein wrote:
  # $FreeBSD: src/etc/crontab,v 1.32 2002/11/22 16:13:39 tom Exp $
  ...
  HOME=/var/log
  
  If this has changed from before, I guess it would be due to a new shell
  forking which always reset $HOME. Thus, it only worked before by sheer
  luck :)
 
 The HOME=/var/log entry in /etc/crontab was set **14 years ago**, so I
 don't know what the OP is talking about.  Nothing has changed there.

Yes, I wasn't implying the problem was with a change to /etc/crontab. I
checked daily/999.local and it hasn't been touched in years, too.

A (very!) wild guess would be that it has to do with the *env() changes
done to the shell. Or were they not merged in time for 6.3-RELEASE?

Cheers,
Ulrich Spoerlein
-- 
It is better to remain silent and be thought a fool,
than to speak, and remove all doubt.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: $HOME changed from 6.2 to 6.3 and 7.0 ?!

2008-02-29 Thread Ulrich Spoerlein

On Fri, 29.02.2008 at 12:52:11 +0100, Thomas Krause wrote:
 Dear list,
 
 after upgrading from 6.2R to 6.3R my daily jobs, which are normaly
 executed from /etc/daily.local, are not longer started.
 The entry in daily.local is
 $HOME/bin/save-conf.sh
 6.2R executed /root/bin/save-conf.sh
 6.3R (and 7.0R) tries to start /var/log/bin/save-conf.bin
 
 Why? I cannot find such a homedir in /etc/passwd!

Wrong place to look, it is set via /etc/crontab:

% more /etc/crontab
# /etc/crontab - root's crontab for FreeBSD
#
# $FreeBSD: src/etc/crontab,v 1.32 2002/11/22 16:13:39 tom Exp $
#
SHELL=/bin/sh
PATH=/etc:/bin:/sbin:/usr/bin:/usr/sbin
HOME=/var/log
#
# ...
#
# Perform daily/weekly/monthly maintenance.
1   3   *   *   *   rootperiodic daily
15  4   *   *   6   rootperiodic weekly
30  5   1   *   *   rootperiodic monthly


If this has changed from before, I guess it would be due to a new shell
forking which always reset $HOME. Thus, it only worked before by sheer
luck :)

Cheers,
Ulrich Spoerlein
-- 
It is better to remain silent and be thought a fool,
than to speak, and remove all doubt.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: finstall alpha3

2008-02-08 Thread Ulrich Spoerlein

On Wed, 06.02.2008 at 11:48:22 +0100, Ivan Voras wrote:
 On the other hand, here's what it *can* do currently:
 
 - it's a live CD environment, completely like an already installed
 FreeBSD system, only running from a read-only media (e.g. it's usable as
 a FixIt system)

This is great, and I think it's the way to go. Since I had to repair my
system the last days with a 'FixIt' CD, I think this mode could get even
more improvement. It would be nice, if there where some additional
system repair tools available on this CD (license permitting, of
course). You'd just have to make sure to still install a clean FreeBSD.
This could be accomplished by using unionfs for the 'enhanced fixit
overlay' or something like that.

Cheers,
Ulrich Spoerlein
-- 
It is better to remain silent and be thought a fool,
than to speak, and remove all doubt.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Reconstruct disklabel for UFS and GELI volumes

2008-02-07 Thread Ulrich Spoerlein

On Feb 7, 2008 1:05 AM, Torfinn Ingolfsen
[EMAIL PROTECTED] wrote:
 Ulrich Spoerlein [EMAIL PROTECTED] wrote:
  There were three labels
 Actually, it is one label per slice, unless you are doing something
 unusual?

s/labels/partitions/ , that's what I meant.

  - ad0s4a: UFS, exact size unknown. Is it possible to infer this from
  the UFS partition size? I can mount this already, as I simply wrote an
  'a' label of maximum size to the disklabel
  - ad0s4b: GELI encrypted swap
  - ad0s4d: GELI encrypted ZVOL

 FWIW, I have had success with scan_ffs[1] as documented in this short
 article[2]. It will recover lost labels, or at least try to.
 If you are downloading binary packages from somewhere, be sure to
 double check that you get the one that fits your platform (i386 / amd64
 or whatever) and version.
 Take it slowly, and double check all steps before comitting anything.

I already made some progress. The GEOM classes place a label into the
last sector (GEOM::GELI) in my case, so I could use this string to
scan the whole slice overnight. Sadly, the geli swap partition has no
such signature, only the geli ad0s4d partition has one. However, using
geli dump I can get the original partition size. I now only have to
adjust the offset/length in the bsdlabel and figure out the original
size for ad0s4a (which I guess was 512MB).

I should have this back running quickly, once I get home to the machine.

Thanks for all the suggestions so far. Since I can't install any
packages (I'm using the 6.2 fixit cd), how can I calculate the file
system size from the ffsinfo output?

Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Reconstruct disklabel for UFS and GELI volumes

2008-02-07 Thread Ulrich Spoerlein

On Thu, 07.02.2008 at 20:49:10 +0100, Nikola Lečić wrote:
  Thanks for all the suggestions so far. Since I can't install any
  packages (I'm using the 6.2 fixit cd), how can I calculate the file
  system size from the ffsinfo output?
 
 I hate to be boring (since I already suggested the use of
 sysutils/testdisk), but it would be really helpful to give it a try.
 Please read this success story (5 mails):

I would, if installing packages was more easily possible with the fixit
mode. Looks like I need to take up my own live CD project again.

   
 http://lists.freebsd.org/pipermail/freebsd-questions/2007-December/164901.html
 
 Among others things, it contains some notes on how to use packages that
 are not included in rescue CD and how to recalculate your bsdlabel
 offsets and other parametres. And yes, with or without GELI your swap
 will appear just as a hole between normal partitions so its
 dimensions are probably the last thing you will reconstruct.

Why is there no metadata for the GELI swap? Is it because the 'label'
command is not used (would make sense to me).

Anyway, I reconstructed my disklabel and everything is back to normal. A
nice exercise it was, though I'd rather have done it on non critical
data :)

Cheers,
Ulrich Spoerlein
-- 
It is better to remain silent and be thought a fool,
than to speak, and remove all doubt.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Reconstruct disklabel for UFS and GELI volumes

2008-02-06 Thread Ulrich Spoerlein

Hi,

Somehow[TM] an installation of 4.11 to ad0s3 managed to wipe out my
existing disklabel for 7.0 on ad0s4. I now need to recover the
disklabel to get my system to boot!

There were three labels
- ad0s4a: UFS, exact size unknown. Is it possible to infer this from
the UFS partition size? I can mount this already, as I simply wrote an
'a' label of maximum size to the disklabel
- ad0s4b: GELI encrypted swap
- ad0s4d: GELI encrypted ZVOL

I only need to find out the start of ad0s4d. Is the consumer size of
an GELI device stored in the last 512 bytes metadata? Or are there
some magic bytes in this 512 bytes so I could find out the exact end
of ad0s4b and thus the start of ad0s4d?

Any help or advice would be highly appreciated!

Thanks,
Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: problems with LC_ALL

2008-01-21 Thread Ulrich Spoerlein

On Sun, 20.01.2008 at 12:53:54 -0800, Javier Elizondo wrote:
 Hi,
 
 I am using darwing in a mac book pro, when I open
 terminal I get the following message that appears only
 in my account, I would like to get help in order to
 fix it.
 
 Last login: Sun Jan 20 14:32:18 on ttys001
 perl: warning: Setting locale failed.
 perl: warning: Please check that your locale settings:
   LC_ALL = (unset),
   LANG = UTF-8
 are supported and installed on your system.
 perl: warning: Falling back to the standard locale
 (C).
 
 I have tried but without success. The languaje is
 EN_US with iso and the keyboard is in spanish, but not
 problem with it.

Your LANG setting of UTF-8 is plain wrong. There is no UTF-8
language. Please check the output of locale and then set LANG to
something that can be find in the output of locale -a.

The keyboard is not affected by LANG, so if you want English error
messages and are using UTF-8, you should place the following in
your shell startup file

export LANG=en_US.UTF-8

Cheers,
Ulrich Spoerlein
-- 
It is better to remain silent and be thought a fool,
than to speak, and remove all doubt.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Backup solution suggestions [ggated]

2008-01-18 Thread Ulrich Spoerlein

On Jan 18, 2008 9:11 AM, Johan Ström [EMAIL PROTECTED] wrote:
 Your no,barely, bad hell no seems to fit pretty good.. I did some
 testing during the night with the above (non-production) setup.
 What I did was doing some rsyncing over the night:

 while true ; do
  echo `date` Clearing vmail  logfile
  rm -rf vmail
  echo `date` Starting rsync  logfile
  rsync -vr /usr/var/vmail . |tee -a logfile
  echo `date` Rsync finished   logfile
 done

 I started this at ~02.0. The results? A freshly rebooted 6.2 (6.2-
 RELEASE-p6 FreeBSD 6.2-RELEASE-p6 #0: Fri Jul 27 15:47:50 UTC 2007)
 box in the morning..
 [...]
 What I dont have is a coredump, judging from dmesg -a savecore wasnt
 even run.. running it now, 5 hours later, didnt find any cores.

 The other end (7.0 server) wasnt affected at all.

 Not realy sure what it had been doing, because looking at my
 bandwidth graphs from the switch, nothing was done at all.. It didnt
 even go through one iteration of rsync... ~7.5k files/directorys
 seems to have been transfered, then the log doesnt say more. But
 according to the BW graph, after ~03.00 no traffic was sent at all...

 Some known bug with 6.2?

There was some ggatec problems with TCP and/or sockets, I think they
have been mostly resolved post-6.2. If you want to pursue this further
(it *would* be a cool setup, no doubt) I'd suggest three things:
- Update to 6.3
- Leave GELI out of the loop for now (only do ggate, with random data perhaps)
- Build a kernel *without* options PREEMPTION

hth,
Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Backup solution suggestions [ggated]

2008-01-17 Thread Ulrich Spoerlein

On Jan 17, 2008 1:31 AM, Johan Ström [EMAIL PROTECTED] wrote:
  Export the disk on the backup server with ggated. Bind it on the
  client
  with ggatec. Slap a GELI or GBDE encryption on top of it and then
  put a
  ZFS on top of it.
 
  You can mount/import this remote ZFS at will and do your zfs
  send/receive on your local box. Nothing ever leaves your box
  unencrypted.

 Now that is a cool solution! That actually sounds like something doable.
 I tried it out some at home between a 6.2 box (client) and 7.0 box
 (server), hosting the system in a ZFS sparse volume with a
 predefined size, exported that via ggated and connected ggatec on the
 client box. I then did some experimentation with just newfs, and it
 worked great!
 The only downside with this would be that the size is fixed. So I
 played around a bit with setting the volsize property in ZFS and it
 seemd to work just fine. zfs list reported the new, bigger, size.
 Restarted ggatec and did a growfs, and then remounted.. Yay bigger
 disk :)
 Then I went on do do some geli test, geli'ed /dev/ggate0 and
 newfs'ed, mounted and played around a bit. All fine.. Now came the
 problem, i unmounetd it, expanded the zfs volume a bit more,
 restarted ggatec and tried to attach it using geli again (note, I
 have no idea if this is supposed to work at all, I'm just testing.
 Havent read such things anywhere). Now I got Invalid argument.
 Im not realy sure about how GEOM works, but if I recall correct it
 uses the last sectors of the disk? If I moved X bytes of data from
 old end of disk to new end of disk, would that make GELI work? If I
 can get that to work, then this would be a kickass solution (all
 encryption stuff works great, I don't have to allocate all space
 immediatly, I can expand it later without destroying data and
 starting from scratch etc).

I'm pretty certain that GELI cannot handle variable sized disks. But
you could add GVIRSTOR into the mix. But I'd just allocate the
necessary space and be done with it. Adding yet another layer is
asking for trouble, imho.

 Some other questions, more related to ggated/c. Is this stable? Good
 working? how does it handle failure situations? Anyone using it for
 production systems?

From my personal experience (which is rather limited): No, barely, bad, hell 
no.

There were/are some open PRs about ggate. I had troubles with
gmirror+ggate in that it would deadlock every other hour on SMP
systems (try removing option PREEMPTION if that bug hits you).

 Yes this is for backup only so minor glitches
 might be acceptable for me, but I'd rather know about those beforehand.

Give it a shot, if your systems stay up and stable, good. If the link
breaks from time to time, I think ZFS should be able to recover most
of it. Since it is your backup, I'd try to break it in interessting
ways first, to get a feel for how robust it is.

Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Backup solution suggestions

2008-01-16 Thread Ulrich Spoerlein

On Wed, 16.01.2008 at 00:26:34 +0100, Johan Ström wrote:
 I create regular tarball (gziped maybee) with some files i want to backup, 
 Then i encrypt this file with ie gpg. Then i send of this file using some 
 unspecified network protocol to the storage server.
 Encrypted all the way, from my end to the remote disk..
 The downside is that it is a static file.. not a dynamic filesystem, 
 nothing I can mount and have easy access to individual files from. *Thats* 
 what I'm looking for.

Export the disk on the backup server with ggated. Bind it on the client
with ggatec. Slap a GELI or GBDE encryption on top of it and then put a
ZFS on top of it.

You can mount/import this remote ZFS at will and do your zfs
send/receive on your local box. Nothing ever leaves your box
unencrypted.

Cheers,
Ulrich Spoerlein
-- 
It is better to remain silent and be thought a fool,
than to speak, and remove all doubt.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: 7.0BETA4 desktop system also periodically freezes

2008-01-15 Thread Ulrich Spoerlein

On Wed, 09.01.2008 at 14:18:59 -0500, J.R. Oldroyd wrote:
 I believe this same problem may also be present on 7.0, at least on
 the BETA releases; BETA4 is the latest I have here.
 
 I have the same problem on two systems here: periodically the systems
 will stop dead (no mouse action, no ping responses from other systems,
 processes with windows on the screen also freeze); the hangs can be
 anything from a few seconds to several MINUTES; then it all comes back
 as if nothing happened except that keyboard input during the freeze
 is lost.  Most of my freezes are a few seconds long, some are in the
 15-60 second range, but (fortunately, rarely) I have seen some that
 lasted 10-15 MINUTES!
 [...]
 
 But the phone might ring, so I'll stop doing things, the system will
 become pretty-much idle, then I'll go to move the mouse and it might be
 frozen.  When it comes back, the small load peak shows, but top and ps
 show nothing unusual.

Sorry for the late reply, I'm behind with reading mails. But to let you
know that you are not the only person, I witnessed pretty much the same
thing. It happens *very* rarely but it does happen from time to time.

I'm running 7.0-PRERELEASE with SCHED_ULE on i386 with 1GB RAM and ZFS
on a GELI provider. The system is running Xorg7.3 with the radeon driver
(I've come to blame Xorg 7.3 for all my recent problems ...)

Anyway, it did *never* happen when I'm playing MP3s, but leaving mplayer
paused for a couple of minutes I would return and the system was
basically frozen. Sometimes after a couple of minutes it would unfreeze
and replay all keystrokes and mouse movements but most of the time I'm
too impatient to wait. I press the power button and nothing happens. A
few minutes later it would do a regular shutdown. Strange things,
indeed.

The only way, I think I could track this down would be to hook up remote
debugging via firewire and break to ddb, but I lack a second firewire
laptop :(


Cheers,
Ulrich Spoerlein
-- 
It is better to remain silent and be thought a fool,
than to speak, and remove all doubt.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: RELENG_7 2008/01/10 desktop system also periodically freezes

2008-01-15 Thread Ulrich Spoerlein

On Sun, 13.01.2008 at 17:25:24 -0500, J.R. Oldroyd wrote:
 David's suggestion re powerd may be relevant.  I'd noticed that the
 problem seems to happen when the system is idle.  I posted earlier that
 it seems like I can do all sorts of work without a problem then I stop for
 a phone call and when I resume it hangs.  I tend to notice a lot of hangs
 when typing an email.

Try with running/looping some MP3 or WAV files. My system never, ever
froze during sound playback. Only when idle. But since I'm running
multiple wmdocklets that update periodically idle is not really true.


Cheers,
Ulrich Spoerlein
-- 
It is better to remain silent and be thought a fool,
than to speak, and remove all doubt.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

sbp(4) write error wedging GEOM mirror

2007-12-28 Thread Ulrich Spoerlein

 shutting down, though

GEOM_MIRROR: Device gm2: rebuilding provider da1 stopped.
Waiting (max 60 seconds) for system process `vnlru' to stop...done
Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining...13 13 7 2 5 3 3 1 1 0 0 0 0 done
All buffers synced.
sbp0:1:0 request timeout(cmd orb:0x1195d28c) ... agent reset
fwohci0: txd err=1e ack type_err
sbp0:1:0 sbp_agent_reset_callback: resp=22
fwohci0: txd err=1e ack type_err
sbp_orb_pointer_callback: xfer-resp = 22
sbp0:1:0 request timeout(cmd orb:0x1195d3c4) ... target reset
fwohci0: txd err=1e ack type_err
sbp0:1:0 sbp_agent_reset_callback: resp=22
fwohci0: txd err=1e ack type_err
sbp_orb_pointer_callback: xfer-resp = 22
sbp0:1:0 request timeout(cmd orb:0x1195d634) ... reset start

 here I unplugged my laptop again 

Uptime: 2h57m3s
fwohci0: BUS reset
fwohci0: node_id=0xc800ffc2, gen=8, CYCLEMASTER mode
firewire0: 3 nodes, maxhop = 2, cable IRM = 2 (me)
firewire0: bus manager 2 (me)
fwohci0: BUS reset
fwohci0: node_id=0xc800ffc2, gen=8, CYCLEMASTER mode
firewire0: 3 nodes, maxhop = 2, cable IRM = 2 (me)
firewire0: bus manager 2 (me)
GEOM_MIRROR: Device gm0: provider mirror/gm0 destroyed.
GEOM_MIRROR: Device gm0 destroyed.
Powering system off using ACPI

Anything I can do to help debugging this Firewire issue?

Cheers,
Ulrich Spoerlein
-- 
It is better to remain silent and be thought a fool,
than to speak, and remove all doubt.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: sbp(4) write error wedging GEOM mirror

2007-12-28 Thread Ulrich Spoerlein

On Fri, 28.12.2007 at 13:54:37 +0100, Ulrich Spoerlein wrote:
 [Ramblings about sbp(4) wedging geom mirror]

Ok, it looks like sbp(4) is off the hook. I tried the rebuilding again,
this time attaching da0 via umass(4) instead of sbp(4) and while it also
eventually wedges, umass can recover from this situation by its own

umass0: Prolific PL-3507C USB Storage Device, rev 2.00/0.01, addr 2
da0 at umass-sim0 bus 0 target 0 lun 0
da0: SAMSUNG SP2514N VF10 Fixed Direct Access SCSI-0 device
da0: 40.000MB/s transfers
da0: 238475MB (488397168 512 byte sectors: 255H 63S/T 30401C)
GEOM_MIRROR: Component da0s1 (device gm0) broken, skipping.
GEOM_MIRROR: Cannot add disk da0s1 to gm0 (error=22).
GEOM_MIRROR: Component da0s2 (device gm1) broken, skipping.
GEOM_MIRROR: Cannot add disk da0s2 to gm1 (error=22).
GEOM_MIRROR: Component da0s1 (device gm0) broken, skipping.
GEOM_MIRROR: Cannot add disk da0s1 to gm0 (error=22).
GEOM_MIRROR: Component da0s1 (device gm0) broken, skipping.
GEOM_MIRROR: Cannot add disk da0s1 to gm0 (error=22).
GEOM_MIRROR: Device gm0: provider da0s1 detected.
GEOM_MIRROR: Device gm0: provider da0s1 is stale.
GEOM_MIRROR: Device gm1: provider da0s2 detected.
GEOM_MIRROR: Device gm1: provider da0s2 is stale.
GEOM_MIRROR: Device gm0: provider da0s1 disconnected.
GEOM_MIRROR: Device gm0: provider da0s1 detected.
GEOM_MIRROR: Device gm0: rebuilding provider da0s1.
fwohci0: BUS reset
fwohci0: node_id=0xc800ffc1, gen=2, CYCLEMASTER mode
firewire0: 2 nodes, maxhop = 1, cable IRM = 1 (me)
firewire0: bus manager 1 (me)
fwohci0: txd err=14 ack busy_X
fwohci0: txd err=14 ack busy_X
fwohci0: txd err=14 ack busy_X
fwohci0: BUS reset
fwohci0: node_id=0xc800ffc1, gen=3, CYCLEMASTER mode
firewire0: 2 nodes, maxhop = 1, cable IRM = 1 (me)
firewire0: bus manager 1 (me)
firewire0: New S400 device ID:0050770e013023f0
da1 at sbp0 bus 0 target 0 lun 0
da1: Prolific PL-3507C Drive 2804 Fixed Simplified Direct Access SCSI-4 device
da1: 50.000MB/s transfers
da1: 381554MB (781422768 512 byte sectors: 255H 63S/T 48641C)
GEOM_MIRROR: Device gm2: provider da1 detected.
GEOM_MIRROR: Device gm2: rebuilding provider da1.
GEOM_MIRROR: Device gm0: rebuilding provider da0s1 finished.
GEOM_MIRROR: Device gm0: provider da0s1 activated.
GEOM_MIRROR: Device gm1: provider da0s2 disconnected.
GEOM_MIRROR: Device gm1: provider da0s2 detected.
GEOM_MIRROR: Device gm1: rebuilding provider da0s2.
(14:08:27) [EMAIL PROTECTED]: ~# gmirror status
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
GEOM_MIRROR: CannotGEOM_MIRROR: Synchronization request failed (error=5). 
da0s2[WRITE(offset=23111270 write metadata on da0s1 (device=gm0, error=5).
GEOM_MIRROR: Cannot update metada400, length=131072)]
GEOM_MIRROR: Device gm1: provider da0s2 disconnected.
GEOta on disk da0s1 (error=5).
M_MIRROR: Device gm1: rebuilding provider da0s2 stopped.
GEOM_MIRROR: Device gm0: provider da0s1 disconnected.
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
Expumass0: BBB reset failed, IOERROR
eumass0: BBB bulk-in clear stall failed, IOERROR
nsumass0: BBB bulk-out clear stall failed, IOERROR
ive timeout(9) function: 0xc09623a9(0xc32de800) 0.006188295 s
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
... (multiple pages)
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
(da0:umass-sim0:0:0:0): Synchronize cache failed, status == 0x4, scsi status == 
0x0
umass0: BBB reset failed, IOERROR
umass0: BBB bulk-in clear stall failed, IOERROR
... (multiple pages)
umass0: BBB bulk-in clear stall failed, IOERROR
umass0: BBB bulk-out clear stall failed, IOERROR
  NameStatus  Components
mirror/gm2  DEGRADED  ad1
  da1 (12%)
mirror/gm0  DEGRADED  ad0s1
mirror/gm1  DEGRADED  ad0s2
(14:14

Re: SMP on FreeBSD 6.x and 7.0: Worth doing?

2007-12-22 Thread Ulrich Spoerlein

On Fri, 21.12.2007 at 22:31:24 -0700, Brett Glass wrote:
 As has been reported in some other messages on this list, Linux is currently 
 blowing FreeBSD away. It's taking as much as 20% less time to get through 
 the benchmark, depending on exactly how the random shuffle came out. This is 
 with 4 GB RAM, the GENERIC FreeBSD SMP kernel (using SCHED_ULE), and aufs as 
 the storage schema for Squid.

Apples and Oranges, I know, but if you're building a simple reverse
cacheing proxy, have you considered Varnish? Would be very interessting
how it would compare to a) FreeBSD+Squid b) Linux+Squid and c)
Linux+Varnish.

Cheers,
Ulrich Spoerlein
-- 
It is better to remain silent and be thought a fool,
than to speak, and remove all doubt.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Threads stuck in sbwait

2007-12-04 Thread Ulrich Spoerlein

Hi all,

we are running the Jabber server Openfire on FreeBSD 6.1 and it
doesn't close its sockets, forcing use to periodically recycle the
java process. Here's some interesting output:

# ps alxHp 51002
  UID   PID  PPID CPU PRI NI   VSZ   RSS MWCHAN STAT  TT   TIME COMMAND
  314 51002 1   0  20  0 492556 104812 ksesig Ss??   10:03.35
/usr/local/jdk1.5.0/bin/java -server -jar -Xmx256M -Dopenfire.lib.dir=
  314 51002 1  17   4  0 492556 104812 sbwait Ss??   10:03.35
/usr/local/jdk1.5.0/bin/java -server -jar -Xmx256M -Dopenfire.lib.dir=
  314 51002 1  17   4  0 492556 104812 sbwait Ss??   10:03.35
/usr/local/jdk1.5.0/bin/java -server -jar -Xmx256M -Dopenfire.lib.dir=
  314 51002 1  17   4  0 492556 104812 accept Ss??   10:03.35
/usr/local/jdk1.5.0/bin/java -server -jar -Xmx256M -Dopenfire.lib.dir=
  314 51002 1  17   4  0 492556 104812 sbwait Ss??   10:03.35
/usr/local/jdk1.5.0/bin/java -server -jar -Xmx256M -Dopenfire.lib.dir=
  314 51002 1  17   4  0 492556 104812 sbwait Ss??   10:03.35
/usr/local/jdk1.5.0/bin/java -server -jar -Xmx256M -Dopenfire.lib.dir=
  314 51002 1  17   4  0 492556 104812 sbwait Ss??   10:03.35
/usr/local/jdk1.5.0/bin/java -server -jar -Xmx256M -Dopenfire.lib.dir=
  314 51002 1  17   4  0 492556 104812 sbwait Ss??   10:03.35
/usr/local/jdk1.5.0/bin/java -server -jar -Xmx256M -Dopenfire.lib.dir=
  314 51002 1  17   4  0 492556 104812 sbwait Ss??   10:03.35
/usr/local/jdk1.5.0/bin/java -server -jar -Xmx256M -Dopenfire.lib.dir=
  314 51002 1  17   4  0 492556 104812 accept Ss??   10:03.35
/usr/local/jdk1.5.0/bin/java -server -jar -Xmx256M -Dopenfire.lib.dir=
  314 51002 1  17   4  0 492556 104812 sbwait Ss??   10:03.35
/usr/local/jdk1.5.0/bin/java -server -jar -Xmx256M -Dopenfire.lib.dir=
  314 51002 1  17   4  0 492556 104812 sbwait Ss??   10:03.35
/usr/local/jdk1.5.0/bin/java -server -jar -Xmx256M -Dopenfire.lib.dir=
  314 51002 1  17   4  0 492556 104812 sbwait Ss??   10:03.35
/usr/local/jdk1.5.0/bin/java -server -jar -Xmx256M -Dopenfire.lib.dir=
  314 51002 1  17   4  0 492556 104812 sbwait Ss??   10:03.35
/usr/local/jdk1.5.0/bin/java -server -jar -Xmx256M -Dopenfire.lib.dir=
  314 51002 1  17   4  0 492556 104812 sbwait Ss??   10:03.35
/usr/local/jdk1.5.0/bin/java -server -jar -Xmx256M -Dopenfire.lib.dir=
  314 51002 1  17   4  0 492556 104812 sbwait Ss??   10:03.35
/usr/local/jdk1.5.0/bin/java -server -jar -Xmx256M -Dopenfire.lib.dir=
...
# lsof -p 51002 | grep CANT
ljava51002 openfire8u  IPv4 0t0 TCP no
PCB, CANTSENDMORE, CANTRCVMORE
java51002 openfire   25u  IPv4 0t0 TCP no PCB,
CANTSENDMORE, CANTRCVMORE
java51002 openfire   27u  IPv4 0t0 TCP no PCB,
CANTSENDMORE, CANTRCVMORE
java51002 openfire   33u  IPv4 0t0 TCP no PCB,
CANTSENDMORE, CANTRCVMORE
java51002 openfire   34u  IPv4 0t0 TCP no PCB,
CANTSENDMORE, CANTRCVMORE
java51002 openfire   38u  IPv4 0t0 TCP no PCB,
CANTSENDMORE, CANTRCVMORE
java51002 openfire   39u  IPv4 0t0 TCP no PCB,
CANTSENDMORE, CANTRCVMORE
java51002 openfire   40u  IPv4 0t0 TCP no PCB,
CANTSENDMORE, CANTRCVMORE
java51002 openfire   43u  IPv4 0t0 TCP no PCB,
CANTSENDMORE, CANTRCVMORE
java51002 openfire   45u  IPv4 0t0 TCP no PCB,
CANTSENDMORE, CANTRCVMORE
java51002 openfire   46u  IPv4 0t0 TCP no PCB,
CANTSENDMORE, CANTRCVMORE
java51002 openfire   47u  IPv4 0t0 TCP no PCB,
CANTSENDMORE, CANTRCVMORE
java51002 openfire   48u  IPv4 0t0 TCP no PCB,
CANTSENDMORE, CANTRCVMORE
java51002 openfire   49u  IPv4 0t0 TCP no PCB,
CANTSENDMORE, CANTRCVMORE
...

A ktrace of the process shows *lots* of kse_release() calls, but I'm
not sure what to look for exactly.

What I would try next, is to use libmap for java to use libthr instead
of libpthread(libkse). Can anyone here confirm, are there known
problems with java and libthr under 6.x?

Thanks,
Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Threads stuck in sbwait

2007-12-04 Thread Ulrich Spoerlein

On Dec 4, 2007 1:01 PM, Ivan Voras [EMAIL PROTECTED] wrote:
  we are running the Jabber server Openfire on FreeBSD 6.1 and it
  doesn't close its sockets, forcing use to periodically recycle the
  java process. Here's some interesting output:

 Can you upgrade to FreeBSD 6.3? There were some fixes that might help
 you. Also, try using libthr instead of libpthread.

Not easily, no. But I'll try libthr upon next restart of the process.

Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: pam_group vs. multiple group lines

2007-08-22 Thread Ulrich Spoerlein

On 8/22/07, Chuck Swiger [EMAIL PROTECTED] wrote:
 On Aug 21, 2007, at 2:02 PM, Richard Foulkes wrote:
  Ok, so how are you supposed to control membership of the wheel
  group via ldap? Ok, you COULD remove the local wheel entry in /etc/
  group, but this would probably be a bad idea if the ldap server
  were unavailable.

 You've aptly summarized my thoughts on the matter-- I would not rely
 on LDAP to provide information about root or the wheel group.

That is exactly the gist of my question. Of course I know that a group
oneliner is the way to go. However, I saw people suggest splitting
groups into multiple lines, if the lines are too long or too many
groups per line (something to do with the /etc/group parser, I guess).

Anyway, I want the LDAP groups to *augment* system groups. Removing
wheel from /etc/group and relying on a complex network service 
not funny.

Besides, it *does* work for file permissions etc. so some basic system
calls *do* get this right.

Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: pam_group vs. multiple group lines

2007-08-22 Thread Ulrich Spoerlein

On Wed, 22.08.2007 at 10:28:40 +0200, Patrick M. Hausen wrote:
 On Wed, Aug 22, 2007 at 09:53:42AM +0200, Ulrich Spoerlein wrote:
  On 8/22/07, Chuck Swiger [EMAIL PROTECTED] wrote:
   On Aug 21, 2007, at 2:02 PM, Richard Foulkes wrote:
Ok, so how are you supposed to control membership of the wheel
group via ldap? Ok, you COULD remove the local wheel entry in /etc/
group, but this would probably be a bad idea if the ldap server
were unavailable.
  
   You've aptly summarized my thoughts on the matter-- I would not rely
   on LDAP to provide information about root or the wheel group.
  
  That is exactly the gist of my question. Of course I know that a group
  oneliner is the way to go. However, I saw people suggest splitting
  groups into multiple lines, if the lines are too long or too many
  groups per line (something to do with the /etc/group parser, I guess).
  
  Anyway, I want the LDAP groups to *augment* system groups. Removing
  wheel from /etc/group and relying on a complex network service 
  not funny.
 
 We do not use LDAP yet, but have been using NIS in our internal
 office network for years. If you use the magic + token to merge
 your NIS database with the static files for passwd and group
 information, then

I'm not using the compat setting, my nsswitch.conf contains

passwd: files ldap
group: files ldap

 _if_ the group entry in the static file does not contain any users
 _then_ the information from NIS is merged in
 
 So you can keep a wheel group around as the _primary_ group
 for root, toor, whatnot ... and all the additional members
 that have wheel as an auxiliary group come from NIS.
 
 Possibly this works for LDAP, too? IMHO at least it should ;-))

THANK YOU! It is indeed working for LDAP too. But it fails for sudo(8).
Luckily I could replace the %wheel directive with a few user id
directives.

It's still a shortcoming of some sort and I guess I'll file a PR if
noone else has any more information on the issue.

getent group now has the following wheel entries
% getent group|grep wheel
wheel:*:0
wheel:*:0:us,root

As I said, su(1) is happy, sudo(8) not yet.

Cheers,
Ulrich Spoerlein
-- 
It is better to remain silent and be thought a fool,
than to speak, and remove all doubt.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: pam_group vs. multiple group lines

2007-08-22 Thread Ulrich Spoerlein

On Wed, 22.08.2007 at 13:47:43 -0500, Scot Hetzel wrote:
 Does the following work for you:
 
 passwd:  ldap [notfound=return] files
 group:   ldap [notfound=return] files
 
 This sets ldap as the authoritative source for users and groups,
 unless the ldap service is down, then it will use the files for the
 source (useful when ldap server is down).  This will require that you
 place all of the users/groups into the ldap server. (modified from the
 nis example in the nsswitch.conf(5) man page)

Thanks for you suggestion!

In the end, I did it the other way round, using:

passwd: files ldap
group: files [success=continue] ldap

This has the effect of merging the multiple group sources into one, as
can be seen here
% getent group|grep wheel
wheel:*:0:root,us

I now have to play a little bit with bootup (no LDAP present) and what
happens when LDAP goes offline, etc.

Thanks again!

Cheers,
Ulrich Spoerlein
-- 
It is better to remain silent and be thought a fool,
than to speak, and remove all doubt.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

pam_group vs. multiple group lines

2007-08-21 Thread Ulrich Spoerlein

Hi,

I think I found a deficiency wrt. to pam_group (which also hits sudo(8)
so this might be libc related instead).

I found this while trying to migrate groups into LDAP, but you don't
need LDAP to reproduce this, simply place the following in /etc/group

wheel:*:0:root
wheel:*:0:us

% getent group|grep wheel;id
wheel:*:0:root
wheel:*:0:us
uid=1001(us) gid=1000(us) groups=1000(us),0(wheel),80(www)

As you can see, getent(1) and id(1) work fine. File access also works
like expected, except for su(8) (because of pam_group group=wheel in
pam.d/su)

% su -
su: Sorry

Combine the wheel entries back into one line and su(8) suddenly starts
working again. Same problem hits sudo(8) if your are using a %wheel
line. Since there is no pam.d/sudo on my system I think the bug probably
lies in libc itself.

Is this expected behaviour? I'd classify it as bug ...

Cheers,
Ulrich Spoerlein
-- 
It is better to remain silent and be thought a fool,
than to speak, and remove all doubt.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: dumping large partition to USB drive fails

2007-07-01 Thread Ulrich Spoerlein

On Wed, 27.06.2007 at 08:12:06 +0200, Roland Smith wrote:
 Unfortunately I can't check the drives with smartctl; they produce an SCSI
 error. I'll try 'camcontrol defects', and see if that turns up anything.

Please try with atausb. Remove umass/da/scsi from your kernel and add
atausb. Might be worth a try.

Other than that, I wish FreeBSD could somehow translate those SMART
commands, so it would work with USB/Firewire enclosures of all sorts.

Cheers,
Ulrich Spoerlein
-- 
The trouble with the dictionary is you have to know how the word is
spelled before you can look it up to see how it is spelled.
-- Will Cuppy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: OpenLDAP unix domain socket leak

2007-06-14 Thread Ulrich Spoerlein


On 6/14/07, Alexandre Biancalana [EMAIL PROTECTED] wrote:

I change nss_ldap.conf again to access OpenLDAP via unix domain socket.

Here is the connection counter before the change:

Wed Jun 13 22:35:55 BRT 2007
unix sockets:   99
tcp sockets:   12


Here is the connection counter rigth before change connection method back to
TCP socket:

Wed Jun 13 22:56:01 BRT 2007
unix sockets: 2902
tcp sockets:   13


Hi,

It looks like it is not actually a leak, per se. Letting slapd sit
there idly for a while, it starts to close the unix domain sockets.
However, if you constantly hit it with requests, it never recuperates.

misctest1# while :; do echo -n `date`   ; lsof 2/dev/null | awk '$1
~ /imapd/{imapd+=1} $1 ~ /slapd/{slapd+=1} $3 ~ /postfix/{pf+=1}
END{print imapd, pf, slapd}'; sleep 15;done
Thu Jun 14 09:27:58 CEST 2007   1354 46 228
Thu Jun 14 09:28:13 CEST 2007   1354 341 516
Thu Jun 14 09:28:29 CEST 2007   1354 325 868
Thu Jun 14 09:28:45 CEST 2007   1308 337 1192
Thu Jun 14 09:29:01 CEST 2007   1308 323 1192
Thu Jun 14 09:29:17 CEST 2007   1308 337 1457
Thu Jun 14 09:29:33 CEST 2007   1308 323 1520
Thu Jun 14 09:29:49 CEST 2007   1262 321 1748
Thu Jun 14 09:30:04 CEST 2007   1262 329 1979
Thu Jun 14 09:30:20 CEST 2007   1262 333 2316
Thu Jun 14 09:30:37 CEST 2007   1262 333 2580
Thu Jun 14 09:30:53 CEST 2007   1262 335 3044
Thu Jun 14 09:31:09 CEST 2007   1262 393 3164
Thu Jun 14 09:31:25 CEST 2007   1262 393 2420
Thu Jun 14 09:31:41 CEST 2007   1262 395 2556
Thu Jun 14 09:31:57 CEST 2007   1262 393 2556
Thu Jun 14 09:32:13 CEST 2007   1262 393 2556
Thu Jun 14 09:32:29 CEST 2007   1262 391 2556
Thu Jun 14 09:32:45 CEST 2007   1262 391 2556
Thu Jun 14 09:33:01 CEST 2007   1262 391 888
Thu Jun 14 09:33:16 CEST 2007   1262 385 888
Thu Jun 14 09:33:32 CEST 2007   1262 94 228
Thu Jun 14 09:33:48 CEST 2007   1262 94 228

I think we really should take this up with the OpenLDAP guys.

Btw, why is lsof printing lines multiple times? I ran lsof through
sort and get almost every line four times:
slapd 94403   ldipr  515uunix 0xc8f680000t0
/var/run/openldap/ldapi
slapd 94403   ldipr  515uunix 0xc8f680000t0
/var/run/openldap/ldapi
slapd 94403   ldipr  515uunix 0xc8f680000t0
/var/run/openldap/ldapi
slapd 94403   ldipr  515uunix 0xc8f680000t0
/var/run/openldap/ldapi
slapd 94403   ldipr  516uunix 0xc8f138580t0
/var/run/openldap/ldapi
slapd 94403   ldipr  516uunix 0xc8f138580t0
/var/run/openldap/ldapi
slapd 94403   ldipr  516uunix 0xc8f138580t0
/var/run/openldap/ldapi
slapd 94403   ldipr  516uunix 0xc8f138580t0
/var/run/openldap/ldapi
slapd 94403   ldipr  517uunix 0xc8f350000t0
/var/run/openldap/ldapi
slapd 94403   ldipr  517uunix 0xc8f350000t0
/var/run/openldap/ldapi
slapd 94403   ldipr  517uunix 0xc8f350000t0
/var/run/openldap/ldapi
slapd 94403   ldipr  517uunix 0xc8f350000t0
/var/run/openldap/ldapi

Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Unix domain socket leak in 6-STABLE

2007-06-13 Thread Ulrich Spoerlein


Hi,

as you are aware, there is a unix domain socket leak in 6-STABLE,
which AFAIK is not yet fully fixed.

I wanted to ask about the status or some possible fixes, as I know a
way to reproduce the problem in a matter of minutes.

We are running Cyrus and Postfix with the user DB in OpenLDAP. When
using ldapi://%2fvar%2frun%2fopenldap%2fldapi/ as a connection URL for
both Postfix' user lookup and cyrus' user lookup (via nss_ldap). slapd
quickly runs out of filedescriptors as it is not closing any unix
sockets (judging by ever increasing lsof output).

Using TCP sockets is just fine. If there are patches I could try,
don't hesitate to send them to me.

Cheers,
Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re Unix domain socket leak in 6-STABLE

2007-06-13 Thread Ulrich Spoerlein


On 6/13/07, Ivan Voras [EMAIL PROTECTED] wrote:

Can you perhaps isolate the bug / give more information on it? I'm
asking because I'm currently using an application with unix domain
sockets in production wich handles lots of connects/disconnects per
second and it doesn't seem to show leakage.


Ok, I'm not exactly sure what I should do. First of all, there are two
LDAP consumers: postfix and cyrus-saslauthd. A fairly common setup, I
suppose.

If I bombard this setup with hundreds of mails, cyrus is at one point
unable to process the mails further, stating that:
Jun 13 18:27:22 misctest1 lmtpunix[47460]: IOERROR: opening
/data/cyrus/spool/user/ulrspoe/cyrus.cache: Too many open files

The error is misleading, though, as it is not cyrus that is out of
file descriptors, but rather OpenLDAP. Restarting slapd will make
cyrus work again.

I logged the lsof output during the mail bomb and the slapd-lines are
continually rising:

misctest1# while :; do echo -n `date`   ; lsof 2/dev/null | awk '$1
~ /imapd/{imapd+=1} $1 ~ /slapd/{slapd+=1} $3 ~ /postfix/{pf+=1}
END{print imapd, pf, slapd}'; sleep 60;done
Wed Jun 13 18:21:55 CEST 2007   1378 71 272
Wed Jun 13 18:22:57 CEST 2007   1378 71 272
Wed Jun 13 18:23:58 CEST 2007   1378 216 316
Wed Jun 13 18:24:59 CEST 2007   1378 321 644
Wed Jun 13 18:26:01 CEST 2007   1378 333 1132
Wed Jun 13 18:27:02 CEST 2007   1378 329 1804
Wed Jun 13 18:28:04 CEST 2007   1378 417 2280

The third column never goes down significantly. I have the setup now
sitting at 2k open files for the slapd process and will wait until
tomorrow, if the count ever decreases again.

Changing from
ldapi://%2fvar%2frun%2fopenldap%2fldapi/ to ldap://127.0.0.1/ fixes
the problem. It might be a genuine problem in OpenLDAP, though. We are
using openldap-server-2.3.34_1

Cheers,
Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Unix domain socket leak in 6-STABLE

2007-06-13 Thread Ulrich Spoerlein

Marc G. Fournier wrote:
 'k, just to ring in here ... I can definitely attest to there being a leak 
 here, as it was me that was originally burned by it ... in my case, I 
 eventually was able to isolate which VPS/jail was causing it and haven't run 
 it 
 since, but was never able to determine exactly what was causing it, since 
 there 
 wasn't really anything unusual running in that jail :(
 
 But ... based on the discussions that were had at the time, it was my 
 understanding that if all applications were shut down on the server (to the 
 bare minimal), eventually the kernel GC should clean up all residual sockets 
 ... when I did this (shut down all applications but the very bare minimum) 
 and 
 waited for 10+ minutes, socket usage never drop'd below about 4k sockets in 
 use, or something like that ...

Hi Marc,

was your leak a kernel leak or a user leak (if it actually makes a
difference). Because I'm only hitting the problem within the slapd
process itself. Restart it, every thing is good again. Other
applications are also no affected.

I think what's happening to me, is that slapd keeps unix domain sockets
lingering too long. When blasting mails through the system, all those
tiny ldap lookups then lead to slapd reaching it's process limit.

I wonder though: maxfilesperproc is roughly 12k, but lsof needs to only
count 2.5k lines of slapd output when the limit is hit. Is there
a better way to check, how much fds/resources are open by a certain process?

When using TCP sockets, the number of open files hardly changes.

Ulrich Spoerlein
-- 
The trouble with the dictionary is you have to know how the word is
spelled before you can look it up to see how it is spelled.
-- Will Cuppy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Change in memory tracking in recent 6-STABLE?

2007-05-31 Thread Ulrich Spoerlein

Peter Jeremy wrote:
 On 2007-May-28 11:29:05 +0200, Ulrich Spoerlein [EMAIL PROTECTED] wrote:
 I'm using symon to monitor memory usage among several FreeBSD machines.
 After updating to a recent 6-STABLE, the amount of memory no longer adds
 up to the total physical memory. The inactive counter is way too
 small.
 
 As well as active, inactive and free, there is cache, wired
 and buffers.
 
 Check the following sysctls:
 vfs.bufspace (bytes)
 vm.stats.vm.v_active_count (pages)
 vm.stats.vm.v_inactive_count (pages)
 vm.stats.vm.v_wire_count (pages)
 vm.stats.vm.v_cache_count (pages)
 vm.stats.vm.v_free_count (pages)

Hi Peter,

Ok, adding up vm.stats gives me the total physical RAM (roughly). One
question would be, where is the buffer cache counted towards? Or is it
spread all over the place?

Back to symon, it uses the following code to grab it's values. This has
worked fine till some months ago. Now it is missing several MBytes. How
should I fix the code?

static int me_vm_mib[] = {CTL_VM, VM_TOTAL};
...
if (sysctl(me_vm_mib, 2, me_vmtotal, me_vmsize, NULL, 0)  0) {
warning(%s:%d: sysctl failed, __FILE__, __LINE__);
bzero(me_vmtotal, sizeof(me_vmtotal));
}

/* convert memory stats to Kbytes */
me_stats[0] = pagetob(me_vmtotal.t_arm);
me_stats[1] = pagetob(me_vmtotal.t_rm);
me_stats[2] = pagetob(me_vmtotal.t_free);

Why are these values not adding up to 256MB in my case?

Ulrich Spoerlein
-- 
The trouble with the dictionary is you have to know how the word is
spelled before you can look it up to see how it is spelled.
-- Will Cuppy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Change in memory tracking in recent 6-STABLE?

2007-05-28 Thread Ulrich Spoerlein

Hi there,

I'm using symon to monitor memory usage among several FreeBSD machines.
After updating to a recent 6-STABLE, the amount of memory no longer adds
up to the total physical memory. The inactive counter is way too
small.

Which recent changes could have caused this? Is it a bug in symon or in
FreeBSD?

An example of the difference can be found here:
http://coyote.dnsalias.net/memory.png

Ulrich Spoerlein
-- 
The trouble with the dictionary is you have to know how the word is
spelled before you can look it up to see how it is spelled.
-- Will Cuppy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: minimizing downtime on upgrades? (for example: mysql 4.1 - 5.0 or php)

2007-05-22 Thread Ulrich Spoerlein

Olivier Mueller wrote:
 Isn't there a better way?  How do you handle such cases? 

We go to extra lengths and allow only pkg installs on servers. That way
we are sure, that no random library pollution takes place. It also makes
stuff better reproducable.

Sadly packages are somewhat neclected and there is still no good
pkg_update tool

 What I'm going to try is to prepare packages of the ports I have to
 upgrade on a dev/test server, and then install them with pkg_add: is
 that the right way ? 

A good way would be to test this very update with packages on a test
box. That is, install mysql4, produce your mysql5 packages somewhere
else (or use a chroot or jail). Then see if pkg-updating works for
mysql.

Ulrich Spoerlein
-- 
The trouble with the dictionary is you have to know how the word is
spelled before you can look it up to see how it is spelled.
-- Will Cuppy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Known memory leak in 6-STABLE from April 1st?

2007-05-15 Thread Ulrich Spoerlein


On 5/14/07, Marc G. Fournier [EMAIL PROTECTED] wrote:

 Now after doing some heavy IMAP testing (cyrus reconstruct of big
 maildirs) the system froze to a complete halt. Stupid me already
 rebooted the machine, tomorrow I'll try to break into DDB when it
 happens again. I also started recording top(1) memory output and
 sysctl vm.zone output.

 The main questions is: Were there any known memory leaks at the start
 of April? Any patches I should blindly try before spending several
 days on debugging this?

Hrmmm ... long shot here, but what does:

sysctl kern.ipc.numopensockets

show over that period of time ... just wondering if we are somehow related on
problems here, just different symptoms ...


Sorry no, nothing suspicous there. It bounces up and down, after
killing all amavis, cyrus and postfix processes it came down to about
80. Right now it's at 280 again, and the memory is so small, that I
can no longer grep(1) a 600MB file or do other useful stuff.

This is the last vm.zone output, anything suspicous? What commands
should I run (in DDB?) to see where the memory is going?

Tue May 15 10:33:38 CEST 2007
vm.zone:
ITEMSIZE LIMIT USEDFREE  REQUESTS

FFS2 dinode: 256,0,  83286,  10179,  1273390
FFS1 dinode: 128,0,  0,  0,0
FFS inode:   132,0,  83286,  10761,  1273390
Mountpoints: 664,0,  7, 11,8
SWAPMETA:276,   121576, 11, 17,   22
pfosfp:   28,0,  0,  0,0
pfospfen:108,0,  0,  0,0
pfiaddrpl:92,0,  0,  0,0
pfstatescrub: 28,0,  0,  0,0
pffrcent: 12,50141,  0,  0,0
pffrcache:48,10062,  0,  0,0
pffrag:   48,0,  0,  0,0
pffrent:  16, 5075,  0,  0,0
pfrkentry2:  156,0,  0,  0,0
pfrkentry:   156,0,  0,  0,0
pfrktable:  1240,0,  0,  0,0
pfpooladdrpl: 68,0,  0,  0,0
pfaltqpl:128,0,  0,  0,0
pfstatepl:   260,10005,  0,  0,0
pfrulepl:604,0,  0,  0,0
pfsrctrpl:   100,0,  0,  0,0
rtentry: 132,0, 37,108, 3848
ripcb:   180,25608,  0,110,   33
sackhole: 20,0,  0,676, 3672
tcpreass: 20, 1690,  0,676, 1365
hostcache:76,15400,  5,245,  317
syncache:100,15366,  0,195,27209
tcptw:48, 5148,  0,624,34253
tcpcb:   464,25600, 29,203,   212261
inpcb:   180,25608, 29,389,   212261
udpcb:   180,25608, 19,179,   493092
ipq:  32,  904,  0,  0,0
unpcb:   144,25623,213,   3405,   300681
socket:  356,25608,263,   5567,  1006076
KNOTE:68,0,  0,616,   724767
PIPE:408,0, 10,818,  1544936
DIRHASH:1024,0,861,247, 4143
NFSNODE: 460,0,  1, 23,7
NFSMOUNT:480,0,  1, 15,2
L VFS Cache: 291,0,402,326, 6124
S VFS Cache:  68,0,  87410,  16638,  1831436
NAMEI:  1024,0,  0,660, 105375628
VNODEPOLL:76,0,  0,250,8
VNODE:   272,0,  83341,  12405,  1273709
ata_composit:196,0,  0,  0,0
ata_request: 204,0,  0, 76,   34
g_bio:   132,0,  0,696, 16904466
ACL UMA zone:388,0,  0,  0,0
mbuf_jumbo_1:  16384,0,  0,  0,0
mbuf_jumbo_9:   9216,0,  0,  0,0
mbuf_jumbo_p:   4096,0,  0,  0,0
mbuf_cluster:   2048,25600,929,325,  1276052
mbuf:256,0,931,554, 141685396
mbuf_packet: 256,0,806,679, 104799211
VMSPACE: 296,0,127,328,  1617479
UPCALL:   44,0,  5,229,   10
KSEGRP:   88,0,386,174,  556
THREAD:  376,0,394,286,   292710
PROC:536,0,170,215,  1617545
Files:72,0,580,   4296, 40984983
4096:   4096,0,229,542,  3118296
2048:   2048,0,229,547,  4814216
1024:   1024,0,347,389,  9329214
512: 512,0,186,398,  1319246
256: 256,0,919,   3941,  5484210
128: 128,0,   2516,   6034, 28789911
64:   64,0,

Re: mfs and buildworlds on the SunFire x4600

2007-05-15 Thread Ulrich Spoerlein

Oliver Fromme wrote:
 Mars G. Miro wrote:
  now we know buildworld on mfs dont really matter on high-end machines,
 
 No, we knew that before.  I could have told you.  :-)
 
 That was the first thing I tested when I first had access
 to a machine with sufficient RAM, about 10 years ago.
 I put /usr/src on an MFS disk, ran buildworld, and was
 disappointed.

I'm not intimately familiar with the build process, but I reckon it
reads several small files several times (ie, they are cached) runs a CPU
bound process, then writes a few bigger files once (objects and
binaries).

Not a good MFS test scenario, indeed.

Ulrich Spoerlein
-- 
The trouble with the dictionary is you have to know how the word is
spelled before you can look it up to see how it is spelled.
-- Will Cuppy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Socket leak (Was: Re: What triggers No Buffer Space) ?Available?

2007-05-15 Thread Ulrich Spoerlein

I'm slowly cathing up on FreeBSD related mails and found this mail ...

Marc G. Fournier wrote:
kern.ipc.numopensockets: 7400
kern.ipc.maxsockets: 12328
   
ps looks like:
   
 
 stuff deleted
 
  2368  p2  Is+  Sat01PM   0:00.03 /bin/tcsh   root2112  0.0  0.1  5220
  2360  p3  Ss+  Sat01PM   0:00.04 /bin/tcsh   root   91221  0.0  0.1  5140
  2440  p4  Ss+  11:49PM   0:00.12 -tcsh (tcsh)
 
  I don't think those processes should consume 7400 sockets.
  Indeed, this really looks like a leak in the kernel.
 
 Robert has sent me a suggestion to try that I'm in the process of putting 
 together right now, involving backing out some work on uipc_usrreg.c ...

How did the backing out work for you?

Ulrich Spoerlein
-- 
The trouble with the dictionary is you have to know how the word is
spelled before you can look it up to see how it is spelled.
-- Will Cuppy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Known memory leak in 6-STABLE from April 1st?

2007-05-14 Thread Ulrich Spoerlein


Hi all,

I observed something funny with our new cyrus/postfix/amavis
installations running on 6.2-STABLE checked out on April 1st (no, I'm
not joking).

They are running symon to grab performance data and I saw the memory
total becoming less and less. Now I know that adding up
free+active+inactive != total ram BUT *all* other FreeBSD machines we
are running show a more or less constant sum.

I uploaded two pictures showing the trend here (They are i386 machines
with 4GB RAM, FreeBSD reports 3.3GB as usable):

http://coyote.dnsalias.net/ms1-day.png
http://coyote.dnsalias.net/ms1-week.png

Now after doing some heavy IMAP testing (cyrus reconstruct of big
maildirs) the system froze to a complete halt. Stupid me already
rebooted the machine, tomorrow I'll try to break into DDB when it
happens again. I also started recording top(1) memory output and
sysctl vm.zone output.

The main questions is: Were there any known memory leaks at the start
of April? Any patches I should blindly try before spending several
days on debugging this?

Thanks!
Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: FreeBSD vs Region Code DVDs

2007-05-04 Thread Ulrich Spoerlein


On 5/4/07, Scott Long [EMAIL PROTECTED] wrote:

 Why can I read and mount the DVD, but mplayer/xine
 are still unable to play the DVD? (It works fine on the internal, ATA
 attached, crappy NEC drive.)

No idea, sorry.  Do you have umass, atapicam, and ata-usb all involved
here?  If so, you've made the room a little crowded, and they are all
arguing with each other.  I know that ata-usb was inspired by the ata
author having problems with umass and not wanting to fix them there,
but I don't know exactly what was broken or what was fixed.


I only tested one subsystem at a time, and it is not that one
subsystem is broken per se, it is only in combination with this single
external Plextor drive. I had another external DVD drive (can't
remember the brand) a few months ago and this also was working just
fine.

I'll try to sum it all up:

Internal NEC drive, attached via ata(4): Can read all kinds of CD/DVD
Internal NEC drive, attached via atapicam(4): dito
Unknown Brand external DVD, attached via umass(4): dito

External Plextor, attached via umass(4): Can read CDs, DVD-Rs, unable
to do _anything_ with retail DVD(-Video)
External Plextor, attached via firewire/sbp(4): dito
External Plextor, attached via atausb(4): Can read CDs, DVD-Rs, can
mount/read retail DVD(-Video), produces some errors, tough. The CSS
decoder seems to fail, as I can't watch the video on the drive. I can
at least _access_ the bytes though, something not possible with
umass/sbp.

I don't know the code, but it looks like this Plextor and cd(4) don't
get along when DVD copy protection is involved. I also read in the
OpenBSD 4.1 release notes, that they made changes to their cd(4) to
work better with region protected DVDs. I didn't know that the OS was
involved in this, I thought this was a thing left to the drive
firmware or the DVD player software.

Anyway, how can I tell cd(4) to give me more error output? How can I
access the DVD at the bottom-most layer? Something line sending a Test
Unit Ready command? Or checking if the drive recognizes an inserted
medium?

Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: FreeBSD vs Region Code DVDs

2007-05-04 Thread Ulrich Spoerlein


On 5/4/07, Craig Boston [EMAIL PROTECTED] wrote:

This is a new drive, correct?  It's possible that the firmware has never
been told what region it's in, and is refusing to read any protected
discs from outside its region (which would be all of them).


I already tried Windows and Linux, to check if the drive would
actually work with retail DVDs at all. Windows told me the RC was set
to 2 (IIRC) and I *can* access the media (more or less) with
atausb(4), so I really think it is CAMs fault. But I'll give the
region-code program from Tijl a try this weekend.

Thanks for all the suggestions so far, let's see which one will get me
further on this quest :)

Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: FreeBSD vs Region Code DVDs

2007-05-04 Thread Ulrich Spoerlein

Tijl Coosemans wrote:
 On Thursday 03 May 2007 20:16:46 Ulrich Spoerlein wrote:
  I can not even read a single sector from such a DVD with the
  external drive, but it's working just fine with the internal one.
  It's really driving me nuts.
 
 Maybe you have to change the drive region code (RPC 2). I had to do
 this a couple years ago with a laptop's internal drive. Either that or
 you need to find a patched firmware to make the drive region free
 (RPC 1).

Sadly, your programs don't work. Neither for the interal drive, nor for
the external one. No matter which media I have inserted:

May  4 21:09:43 roadrunner kernel: (cd0:umass-sim0:0:0:0): Vendor Specific 
Command. CDB: a4 0 0 0 0 0 0 0 0 8 8 0 
May  4 21:09:43 roadrunner kernel: (cd0:umass-sim0:0:0:0): CAM Status: SCSI 
Status Error
May  4 21:09:43 roadrunner kernel: (cd0:umass-sim0:0:0:0): SCSI Status: Check 
Condition
May  4 21:09:43 roadrunner kernel: (cd0:umass-sim0:0:0:0): ILLEGAL REQUEST 
asc:24,0
May  4 21:09:43 roadrunner kernel: (cd0:umass-sim0:0:0:0): Invalid field in CDB
May  4 21:09:43 roadrunner kernel: (cd0:umass-sim0:0:0:0): Unretryable error


May  4 21:10:22 roadrunner kernel: acd0: FAILURE - REPORT_KEY ILLEGAL REQUEST 
asc=0x24 ascq=0x00 
May  4 21:10:22 roadrunner kernel: (cd1:ata1:0:0:0): Vendor Specific Command. 
CDB: a4 0 0 0 0 0 0 0 0 8 8 0 
May  4 21:10:22 roadrunner kernel: (cd1:ata1:0:0:0): CAM Status: SCSI Status 
Error
May  4 21:10:22 roadrunner kernel: (cd1:ata1:0:0:0): SCSI Status: Check 
Condition
May  4 21:10:22 roadrunner kernel: (cd1:ata1:0:0:0): ILLEGAL REQUEST asc:24,0
May  4 21:10:22 roadrunner kernel: (cd1:ata1:0:0:0): Invalid field in CDB
May  4 21:10:22 roadrunner kernel: (cd1:ata1:0:0:0): Unretryable error

Try to mount an ordinary data DVD results in
May  4 21:13:04 roadrunner kernel: (cd0:umass-sim0:0:0:0): READ TOC/PMA/ATIP 
{MMC Proposed}. CDB: 43 0 0 0 0 0 1 0 c 0 
May  4 21:13:04 roadrunner kernel: (cd0:umass-sim0:0:0:0): CAM Status: SCSI 
Status Error
May  4 21:13:04 roadrunner kernel: (cd0:umass-sim0:0:0:0): SCSI Status: Check 
Condition
May  4 21:13:04 roadrunner kernel: (cd0:umass-sim0:0:0:0): ILLEGAL REQUEST 
asc:24,0
May  4 21:13:04 roadrunner kernel: (cd0:umass-sim0:0:0:0): Invalid field in CDB
May  4 21:13:04 roadrunner kernel: (cd0:umass-sim0:0:0:0): Unretryable error
May  4 21:13:04 roadrunner kernel: g_vfs_done():cd0[READ(offset=32768, 
length=2048)]error = 5


Ulrich Spoerlein
-- 
The trouble with the dictionary is you have to know how the word is
spelled before you can look it up to see how it is spelled.
-- Will Cuppy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

FreeBSD vs Region Code DVDs

2007-05-03 Thread Ulrich Spoerlein

Hi all,

I'm having a hard time getting my external (USB, Firewire) Plextor
PX-755UF to read any retail DVDs at all. I can read any kind of CDs and
also DVD-Rs. But mastered DVDs are invisible to FreeBSD.

I can not even read a single sector from such a DVD with the external
drive, but it's working just fine with the internal one. It's really
driving me nuts.

umass0: PLEXTOR DVDR   PX-755A, class 0/0, rev 2.00/4.35, addr 126
umass0:  8070i (ATAPI) over Bulk-Only; quirks = 0x
umass0:1:0:-1: Attached to scbus1
cd0 at umass-sim0 bus 0 target 0 lun 0
cd0: PLEXTOR DVDR   PX-755A 1.06 Removable CD-ROM SCSI-0 device 
cd0: 40.000MB/s transfers
cd0: Attempt to query device size failed: NOT READY, Medium not present - tray 
closed
cd1 at ata1 bus 0 target 0 lun 0
cd1: _NEC DVD_RW ND-5500A 1.51 Removable CD-ROM SCSI-0 device 
cd1: 33.000MB/s transfers
cd1: Attempt to query device size failed: NOT READY, Medium not present

The NEC drive can read DVDs just fine (although it sucks). 

% recoverdisk /dev/cd1
startsize   len state  done remaining% done
0 10485767411912704 0 07411912704 
0.000^C
(130)% recoverdisk /dev/cd0
recoverdisk: DIOCGMEDIASIZE failed: No such file or directory
(1)% dd if=/dev/cd0 bs=2048
0+0 records in
0+0 records out
0 bytes transferred in 0.93 secs (0 bytes/sec)

If I attach the device via Firewire, it's just the same. Perhaps it
requires some sort of quirk? Where should I start looking for debug
output? Which test should I run. Any help would be greatly appreciated.

Bye,
Ulrich Spoerlein
-- 
The trouble with the dictionary is you have to know how the word is
spelled before you can look it up to see how it is spelled.
-- Will Cuppy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: FreeBSD vs Region Code DVDs

2007-05-03 Thread Ulrich Spoerlein

Sean C. Farley wrote:
 On Thu, 3 May 2007, Ulrich Spoerlein wrote:
 I had an issue with ripping some DVD's to my laptop before a trip I made
 (note:  no distribution occurred (for the lawyers :))).  I wanted to
 just use dd to do it, but dd would fail after a small amount of data was
 read.  If I first played a little of the DVD with mplayer, then dd would
 work afterwards.  It probably had something to do with mplayer
 whispering sweet nothings to the DVD player.

Wouldn't help in my case, as the disc cannot be accessed in anyway.

But ...

atausb(4) to the rescue!

I recompiled my kernel with atausb(4) to rule out problems inside CAM,
lo' and behold:

umass0: PLEXTOR DVDR   PX-755A, class 0/0, rev 2.00/4.35, addr 121
umass0:  8070i (ATAPI) over Bulk-Only; quirks = 0x
umass0:3:0:-1: Attached to scbus3
cd1 at umass-sim0 bus 0 target 0 lun 0
cd1: PLEXTOR DVDR   PX-755A 1.06 Removable CD-ROM SCSI-0 device 
cd1: 40.000MB/s transfers
cd1: cd present [3614880 x 2048 byte records]

(no glabel tasting, no reading from the device possible)

umass0: at uhub3 port 1 (addr 121) disconnected
(cd1:umass-sim0:0:0:0): lost device
(cd1:umass-sim0:0:0:0): removing device entry
umass0: detached
atausb0: PLEXTOR DVDR   PX-755A, class 0/0, rev 2.00/4.35, addr 121
atausb0: using ATAPI over Bulk-Only
ata2: USB lun 0 on atausb0
acd1: DEVICE_RESET unsupported
acd1: DVDR DVDR PX-755A/1.06 at ata2-master USB2
acd1: FAILURE - INQUIRY ILLEGAL REQUEST asc=0x24 ascq=0x00 
cd1 at ata2 bus 0 target 0 lun 0
cd1: PLEXTOR DVDR   PX-755A 1.06 Removable CD-ROM SCSI-0 device 
cd1: 3.300MB/s transfers
cd1: Attempt to query device size failed: NOT READY, Logical unit is inprocess 
of becoming ready
GEOM_LABEL: Label for provider acd1 is iso9660/FIREFLY_DISC2.
acd1: FAILURE - READ_TOC ILLEGAL REQUEST asc=0x24 ascq=0x00 
acd1: FAILURE - READ_TOC ILLEGAL REQUEST asc=0x24 ascq=0x00 
acd1: FAILURE - READ_TOC ILLEGAL REQUEST asc=0x24 ascq=0x00 
acd1: FAILURE - READ_TOC ILLEGAL REQUEST asc=0x24 ascq=0x00 
acd1: FAILURE - READ_TOC ILLEGAL REQUEST asc=0x24 ascq=0x00 

Might these ILLEGAL REQUESTs give a clue to what is going wrong when
trying to access this device with cd(4)? Why is it only reporting/using
3.3MB/s transfers? Why can I read and mount the DVD, but mplayer/xine
are still unable to play the DVD? (It works fine on the internal, ATA
attached, crappy NEC drive.)

Perhaps Scott can share some SCSI wisdom on this matter. I really need
to use this drive via Firewire, ie. cd(4), so atausb(4) is no permanent
solution.

Ulrich Spoerlein

PS: why is iostat(1) not working for acd(4) devices?
-- 
The trouble with the dictionary is you have to know how the word is
spelled before you can look it up to see how it is spelled.
-- Will Cuppy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

make: parallel jobs broken when using -f -

2007-04-12 Thread Ulrich Spoerlein

Hi Hartmut,

there is an annoying bug in 6-STABLE make(1), where -f - seems to
serialize the target making.

Consider the following Makefile

all: a b c d

a b c d:
@echo Makeing ${.TARGET}
@sleep 4

And observe the following behaviour:
$ make -j4
Makeing a
Makeing b
Makeing c
Makeing d
  pause
$ make -j4 -f-  Makefile
Makeing b
Makeing d
  pause
Makeing a
  pause
Makeing c
  pause
$

The make(1) on -CURRENT has this fixed already, is there any chance of
this getting MFCed?

AFAICS the following revisions are not up to date (wrt to CURRENT):
 $FreeBSD: src/usr.bin/make/job.c,v 1.122.2.1 2005/07/20 19:05:23 harti Exp 
$
 $FreeBSD: src/usr.bin/make/main.c,v 1.155 2005/05/24 16:05:51 harti Exp $
 $FreeBSD: src/usr.bin/make/parse.c,v 1.108.2.1 2005/11/16 08:25:19 ru Exp $
 $FreeBSD: src/usr.bin/make/str.c,v 1.45.2.1 2006/10/16 11:51:18 ru Exp $
 $FreeBSD: src/usr.bin/make/var.c,v 1.159 2005/05/24 16:05:51 harti Exp $


Ulrich Spoerlein
-- 
The trouble with the dictionary is you have to know how the word is
spelled before you can look it up to see how it is spelled.
-- Will Cuppy
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Weird NFS behaviour

2007-03-10 Thread Ulrich Spoerlein


Hi,

we have performance problems with our FreeBSD 6.2 based NFS server.
Picture the following setup:

FreeBSD Client --- Samba-Server  --- NFS-Server

all three machines are running FreeBSD 6.2 (the same image). The NFS
server is configured with 16 nfsd. sysctl.conf has
net.inet.tcp.sendspace=65536
net.inet.tcp.recvspace=65536

Now, what's the problem: The Samba-Server mounts shares via NFS. All
servers are on Gigabit Ethernet and I get read transfer rates
exceeding 50MB/s from the NFS server.

This is all good and well, but if I copy a file via scp(1) (sic!) to
the samba server into the NFS mounted directory, not only do I
seldomly exceed 12MB/s but I also get a very strange traffic pattern
on the em0 interface of the samba server. I get _twice_ as much
incoming traffic on the em0 interface as outgoing traffic.

systat -if on samba:
 em0  in 24.726 MB/s 25.905 MB/s3.046 GB
out12.941 MB/s 13.558 MB/s1.994 GB

systat -if on nfs-server
 em0  in 11.497 MB/s 12.999 MB/s3.727 GB
out11.878 MB/s 13.423 MB/s  995.485 MB

To stress, this is running:
gigabit-client:# scp large-file [EMAIL PROTECTED]:/mnt/nfs-share/

The wicked part is this: If I copy a file from the samba server
directly to the NFS share (not as a passthrough), I get these traffic
patterns:
systat -if on samba:
  em0  in432.724 KB/s432.724 KB/s3.772 GB
out12.399 MB/s 12.399 MB/s2.481 GB

systat -if on nfs:
  em0  in 12.091 MB/s 15.791 MB/s  184.766 MB
out   440.939 KB/s562.521 KB/s1.339 GB

This is running:
samba:# cp large-file /mnt/nfs-share/

What on earth is causing each received NFS packet to be _bounced_ to
the samba server when using ssh, scp, smbd, etc. And not when
generating the traffic locally?

nfsstat -s is showing an increase in READ calls similar to WRITE calls
when using the samba machine as pass-through. It is showing _no_
increase in READ calls when copying the files directly.

NB: All these test were run _without_ smbd running, it's just that
this server is designated to become our samba server.

Setting vfs.nfsrv.async=1 doubled write performance, but the weird
traffic pattern remains. (Am I asking for too much trouble by setting
async NFS?)

Thanks for any pointers!
Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Weird NFS behaviour

2007-03-09 Thread Ulrich Spoerlein


Hi,

we have performance problems with our FreeBSD 6.2 based NFS server.
Picture the following setup:

FreeBSD Client --- Samba-Server  --- NFS-Server

all three machines are running FreeBSD 6.2 (the same image). The NFS
server is configured with 16 nfsd. sysctl.conf has
net.inet.tcp.sendspace=65536
net.inet.tcp.recvspace=65536

Now, what's the problem: The Samba-Server mounts shares via NFS. All
servers are on Gigabit Ethernet and I get read transfer rates
exceeding 50MB/s from the NFS server.

This is all good and well, but if I copy a file via scp(1) (sic!) to
the samba server into the NFS mounted directory, not only do I
seldomly exceed 12MB/s but I also get a very strange traffic pattern
on the em0 interface of the samba server. I get _twice_ as much
incoming traffic on the em0 interface as outgoing traffic.

systat -if on samba:
 em0  in 24.726 MB/s 25.905 MB/s3.046 GB
out12.941 MB/s 13.558 MB/s1.994 GB

systat -if on nfs-server
 em0  in 11.497 MB/s 12.999 MB/s3.727 GB
out11.878 MB/s 13.423 MB/s  995.485 MB

To stress, this is running:
gigabit-client:# scp large-file [EMAIL PROTECTED]:/mnt/nfs-share/

The wicked part is this: If I copy a file from the samba server
directly to the NFS share (not as a passthrough), I get these traffic
patterns:
systat -if on samba:
  em0  in432.724 KB/s432.724 KB/s3.772 GB
out12.399 MB/s 12.399 MB/s2.481 GB

systat -if on nfs:
  em0  in 12.091 MB/s 15.791 MB/s  184.766 MB
out   440.939 KB/s562.521 KB/s1.339 GB

This is running:
samba:# cp large-file /mnt/nfs-share/

What on earth is causing each received NFS packet to be _bounced_ to
the samba server when using ssh, scp, smbd, etc. And not when
generating the traffic locally?

nfsstat -s is showing an increase in READ calls similar to WRITE calls
when using the samba machine as pass-through. It is showing _no_
increase in READ calls when copying the files directly.

NB: All these test were run _without_ smbd running, it's just that
this server is designated to become our samba server.

Setting vfs.nfsrv.async=1 doubled write performance, but the weird
traffic pattern remains. (Am I asking for too much trouble by setting
async NFS?)

Thanks for any pointers!
Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Some days, it doesn't pay to upgrade ...

2007-03-04 Thread Ulrich Spoerlein

Marc G. Fournier wrote:
 I don't know how critical this is, but I just thought about it ... this is my 
 only system running gmirror ... everything seems fine according ot gmirror 
 status, but maybe something iswron gthere I'm not seeing:

You should tell us, in which state those processes hung. It might also
be good to use DDB and showalllocks to see if it is a deadlock.

I for one had several deadlocks with gmirror on an SMP machine.

Ulrich Spoerlein
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: sysutils/fusefs-ntfs working for anyone?

2007-02-20 Thread Ulrich Spoerlein

Wang Yi wrote:
 I'm using ntfs-3g now. the version is same to yours. But only the difference 
 is the disk I 
 used is a physical disk.

I also had no luck using it on my existing NTFS partition, though I'd
like to experiment on a clean partition first.

Could you please run a test with mdconfig and mkfs.ntfs (you have to use
the -F flag)?

Jan Henrik Sylvester wrote:
 On 6.2-RELEASE using fusefs-kmod-0.3.0_4, fusefs-libs-2.6.2, and 
 fusefs-ntfs-0.20070207RC1, I 
 can mount my existing (Windows XP) NTFS partition with 'ntfs-3g /dev/ad0s1 
 /mnt/ad0s1'.
 
 The following error messages about missing /proc/filesystems and modprobe can 
 be ignored, 
 since defaults are assumed in case of missing information. (I read about it 
 on a fusefs 
 mailing list concerning Darwin.)

The critical part seems to be the seekscript. Could one of you guys
provide me with a ktrace/kdump output, so I can investigate this
further? You should run ktrace with the -i flag and probably send the
output off-list.

Thanks!

Ulrich Spoerlein
-- 
A: Yes.
Q: Are you sure?
 A: Because it reverses the logical flow of conversation.
 Q: Why is top posting frowned upon?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

sysutils/fusefs-ntfs working for anyone?

2007-02-18 Thread Ulrich Spoerlein

Hi there,

I've been trying to mount my NTFS partitions with the NTFS-3g project's
FUSE implementation but am unable to mount anything.

I'm on 6-STABLE and have the latest versions of FUSE installed:

fusefs-kmod-0.3.0_4 Kernel module for fuse
fusefs-libs-2.6.2   FUSE allows filesystem implementation in userspace
fusefs-ntfs-0.20070207RC1 Mount NTFS partitions and disk images

I use the sysutils/ntfsprogs port to create a NTFS filesystem. I can
also mount this filesystem using mount.ntfs, yet I fail to get anywhere
with ntfs-3g. What's that darn seekscript about anyway?

# mkfs.ntfs -fF /dev/md7
/dev/md7 is not a block device.
mkntfs forced anyway.
The sector size was not specified for /dev/md7 and it could not be obtained 
automatically.  It has been set to 512 bytes.
The partition start sector was not specified for /dev/md7 and it could not be 
obtained automatically.  It has been set to 0.
The number of sectors per track was not specified for /dev/md7 and it could not 
be obtained automatically.  It has been set to 0.
The number of heads was not specified for /dev/md7 and it could not be obtained 
automatically.  It has been set to 0.
Cluster size has been automatically set to 512 bytes.
To boot from a device, Windows needs the 'partition start sector', the 'sectors 
per track' and the 'number of heads' to be set.
Windows will not be able to boot from this device.
Creating NTFS volume structures.
mkntfs completed successfully. Have a nice day.
# ntfs-3g /dev/md7 /mnt 
Failed to open /proc/filesystems: No such file or directory
modprobe: not found
Failed to open /proc/filesystems: No such file or directory
# mount_fusefs: seekscript failed

The fuse module is loaded, of course. A ktrace of the ntfs-3g is, umm,
interesting, to say the least. Lot's of sh(1), awk(1) and fstat(1)
calls. It even tries to load modprobe, as you can see from the output
above too.

So, the basic question is: Has _anybody_ used ntfs-3g successfully on
RELENG_6?

Ulrich Spoerlein
-- 
A: Yes.
Q: Are you sure?
 A: Because it reverses the logical flow of conversation.
 Q: Why is top posting frowned upon?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Failover-HA-Setup

2007-01-19 Thread Ulrich Spoerlein

Richard wrote:
  There is no need to make any changes to the script. Put whatever other
   options you want for mysql in rc.conf, and set the _enable variable
  to no. Then you can run /usr/local/etc/rc.d/mysql-server onestart and
  it will start normally just one time.
 
 Yes, and mysql will be started at bootup time on both nodes, wouldn't
 it? So one node would fail miserably since the lack of mounted
 diskspace...

No, he wrote to set mysql_enable=NO, ie, the usual startup procedure
will NOT start it.

This doesn't work with heartbeat, however. heartbeat always calls the
resource scripts with either 'start' or 'stop', you can't make it pass
'onestart'.

Only two options remain: modify existing mysql-server script (bad idea,
will be overwritten on update) or go through a proxy script which
transforms start|stop - onestart|onestop

You could also alter the environment of heartbeat (it's really just a
bunch of poorly written shell scripts) and set mysql_enable=YES there,
but that'd be just as fragile as rewriting the existing mysql-server
script.

 But the nostart-solution sounds like working...

Till you update the port and forget about your local modification ...

Ulrich Spoerlein
-- 
A: Yes.
Q: Are you sure?
 A: Because it reverses the logical flow of conversation.
 Q: Why is top posting frowned upon?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Page Fault in 6.2-PRE RELEASE

2007-01-02 Thread Ulrich Spoerlein

Christopher Harper (05056409) wrote:
 The system freezes randomly and no longer accepts any input and after a 
 minute of being 'frozen' reboots.
 (kgdb) backtrace
 #0  doadump () at pcpu.h:165
 #1  0xc051a6ca in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
 #2  0xc051a9f1 in panic (fmt=0xc06d94cf %s) at 
 /usr/src/sys/kern/kern_shutdown.c:565
 #3  0xc06a795c in trap_fatal (frame=0xe6dc6bac, eva=4) at 
 /usr/src/sys/i386/i386/trap.c:837
 #4  0xc06a769b in trap_pfault (frame=0xe6dc6bac, usermode=0, eva=4) at 
 /usr/src/sys/i386/i386/trap.c:745
 #5  0xc06a72d5 in trap (frame=
   {tf_fs = 8, tf_es = -963641304, tf_ds = -961019864, tf_edi = 
 -963964928, tf_esi = 0, tf_ebp = -421762056, tf_isp = -421762088, tf_ebx = 
 -963961644, tf_edx = -421655072, tf_ecx = -964085760, tf_eax = 0, tf_trapno = 
 12, tf_err = 0, tf_eip = -1067808807, tf_cs = 32, tf_eflags = 66118, tf_esp = 
 -963961644, tf_ss = -963615744})
 at /usr/src/sys/i386/i386/trap.c:435
 #6  0xc069342a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
 #7  0xc05a87d9 in ieee80211_free_node (ni=0x0) at 
 /usr/src/sys/net80211/ieee80211_node.c:1605
 #8  0xc04b1923 in ural_txeof (xfer=0xc6b82d00, priv=0xc68b1cd4, 
 status=USBD_NORMAL_COMPLETION) at /usr/src/sys/dev/usb/if_ural.c:888
 #9  0xc04c9b1a in usb_transfer_complete (xfer=0xc6b82d00) at 
 /usr/src/sys/dev/usb/usbdi.c:863
 #10 0xc04acbae in ehci_idone (ex=0xc6b82d00) at 
 /usr/src/sys/dev/usb/ehci.c:852
 #11 0xc04acaeb in ehci_check_intr (sc=0xc6893800, ex=0xc6b82d00) at 
 /usr/src/sys/dev/usb/ehci.c:759
 #12 0xc04aca25 in ehci_softintr (v=0xc6893800) at 
 /usr/src/sys/dev/usb/ehci.c:693
 #13 0xc04c6e55 in usb_schedsoftintr (bus=0x0) at 
 /usr/src/sys/dev/usb/usb.c:871
 #14 0xc04ac806 in ehci_intr1 (sc=0xc6893800) at 
 /usr/src/sys/dev/usb/ehci.c:593
 #15 0xc04ac746 in ehci_intr (v=0xc6893800) at /usr/src/sys/dev/usb/ehci.c:552
 #16 0xc0505059 in ithread_execute_handlers (p=0xc68ff648, ie=0xc67e3b00) at 
 /usr/src/sys/kern/kern_intr.c:682
 #17 0xc0505169 in ithread_loop (arg=0xc68b7480) at 
 /usr/src/sys/kern/kern_intr.c:765
 #18 0xc0503e0d in fork_exit (callout=0xc0505114 ithread_loop, 
 arg=0xc68b7480, frame=0xe6dc6d38) at /usr/src/sys/kern/kern_fork.c:821
 #19 0xc069348c in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:208

This is the same as kern/92083 [1]

I could suggest, that you try with the new USB stack by Hans-Petter
Selasky. But there is a different bug in his ural(4), that makes it
unusable too.

http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/92083

Ulrich Spoerlein
-- 
A: Yes.
Q: Are you sure?
 A: Because it reverses the logical flow of conversation.
 Q: Why is top posting frowned upon?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

acquiring duplicate lock when mounting nullfs

2006-12-29 Thread Ulrich Spoerlein


Hi,

this is on a RELENG_6 while mounting /usr/src and /usr/obj via nullfs
and doing 'make installkernel installworld'

It is similar to LOR #083, but not quite the same

acquiring duplicate lock of same type: vnode interlock
1st vnode interlock @ /usr/src/sys/kern/vfs_vnops.c:806
2nd vnode interlock @ /usr/src/sys/kern/vfs_subr.c:2036
KDB: stack backtrace:
kdb_backtrace(0,ff,c09816d0,c09816d0,c0907904,...) at kdb_backtrace+0x29
witness_checkorder(c30d56dc,9,c089bd90,7f4) at witness_checkorder+0x578
_mtx_lock_flags(c30d56dc,0,c089bd90,7f4,c218d830,...) at _mtx_lock_flags+0x78
vrefcnt(c30d5660) at vrefcnt+0x1d
null_checkvp(c2a8daa0,c08894b8,215) at null_checkvp+0x56
null_lock(cd689a80) at null_lock+0x62
VOP_LOCK_APV(c0900480,cd689a80) at VOP_LOCK_APV+0x87
vn_lock(c2a8daa0,1002,c27a3180,c2a8daa0,c31bbc2c,...) at vn_lock+0xa8
nullfs_root(c246d7c8,2,cd689af8,c27a3180,0,8,0,c09beca0,0,c089b632,3dd)
at nullfs_root+0x26
vfs_domount(c27a3180,c261c550,c28f7100,0,c239eb10,c09707e0,0,c089b632,2a3)
at vfs_domount+0x91d
vfs_donmount(c27a3180,0,c2a12e80,c2a12e80,0,...) at vfs_donmount+0x2ef
nmount(c27a3180,cd689d04) at nmount+0x8b
syscall(3b,3b,3b,bfbfe424,bfbfec7c,...) at syscall+0x25b
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (378, FreeBSD ELF32, nmount), eip = 0x280ba4d7, esp =
0xbfbfe3ac, ebp = 0xbfbfec28 ---

Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: acquiring duplicate lock when mounting nullfs

2006-12-29 Thread Ulrich Spoerlein


On 12/29/06, Ulrich Spoerlein [EMAIL PROTECTED] wrote:

It is similar to LOR #083, but not quite the same

acquiring duplicate lock of same type: vnode interlock
 1st vnode interlock @ /usr/src/sys/kern/vfs_vnops.c:806
 2nd vnode interlock @ /usr/src/sys/kern/vfs_subr.c:2036


It seems the issue is known:
http://lists.freebsd.org/pipermail/freebsd-amd64/2006-March/007824.html

Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: kern/92785: Using exported filesystem on OS/2 NFS client causes filesystem freeze

2006-12-15 Thread Ulrich Spoerlein


Hi,

we too, ran into this problem. OS/2 Clients kill our NFS server. It is
running a RELENG_6 snapshot from 2006-11-14. rpc.lockd and rpc.statd
are running. I'll conduct a test without those two services shortly.

You can still log in the system with ssh and cruse around, but mountd
is stuck in ufs state and is no longer serving requests.

[EMAIL PROTECTED]:~# ps axl | grep ufs
   0 39370 1   0  -4  0  3052  2200 ufsDs??0:00.01
/usr/sbin/mountd -r

db show lockedvnods
Locked vnodes

0xc87b9414: tag ufs, type VDIR
   usecount 0, writecount 0, refcount 4 mountedhere 0
   flags (VV_ROOT)
   v_object 0xc8c43c60 ref 0 pages 1
lock type ufs: EXCL (count 4) by thread 0xc8bac300 (pid 6926)
with 1 pending#0 0xc0668bf9 at lockmgr+0x4ed
#1 0xc078572e at ffs_lock+0x76
#2 0xc0838287 at VOP_LOCK_APV+0x87
#3 0xc06d663c at vn_lock+0xac
#4 0xc06ca4ca at vget+0xc2
#5 0xc06c24a9 at vfs_hash_get+0x8d
#6 0xc07844af at ffs_vget+0x27
#7 0xc078b253 at ufs_lookup+0xa4b
#8 0xc083641b at VOP_CACHEDLOOKUP_APV+0x9b
#9 0xc06bf499 at vfs_cache_lookup+0xb5
#10 0xc0836347 at VOP_LOOKUP_APV+0x87
#11 0xc06c3626 at lookup+0x46e
#12 0xc0734fba at nfs_namei+0x40e
#13 0xc0726d81 at nfsrv_lookup+0x1dd
#14 0xc0736765 at nfssvc_nfsd+0x3d9
#15 0xc07360b4 at nfssvc+0x18c
#16 0xc0825a07 at syscall+0x25b
#17 0xc0811f7f at Xint0x80_syscall+0x1f

   ino 2, on dev da1s2e


db trace 6926
Tracing pid 6926 tid 100106 td 0xc8bac300
sched_switch(c8bac300,0,1) at sched_switch+0x177
mi_switch(1,0) at mi_switch+0x270
sleepq_switch(c8678200) at sleepq_switch+0xc1
sleepq_wait_sig(c8678200) at sleepq_wait_sig+0x1d
msleep(c8678200,c09c9f00,158,c088bec9,0,...) at msleep+0x26a
nfssvc_nfsd(c8bac300) at nfssvc_nfsd+0xe5
nfssvc(c8bac300,eafd4d04) at nfssvc+0x18c
syscall(3b,3b,3b,1,0,...) at syscall+0x25b
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (155, FreeBSD ELF32, nfssvc), eip = 0x280bd1b7, esp =
0xbfbfe90c, ebp = 0xbfbfe928 ---
db trace 39370
Tracing pid 39370 tid 100102 td 0xc8bac900
sched_switch(c8bac900,0,1) at sched_switch+0x177
mi_switch(1,0) at mi_switch+0x270
sleepq_switch(c87b946c,c0973440,0,c089798c,211,...) at sleepq_switch+0xc1
sleepq_wait(c87b946c,0,c87b94dc,b7,c08929b8,...) at sleepq_wait+0x46
msleep(c87b946c,c0972500,50,c089c1c1,0,...) at msleep+0x279
acquire(eafe094c,40,6,c8bac900,0,...) at acquire+0x76
lockmgr(c87b946c,2002,c87b94dc,c8bac900) at lockmgr+0x44e
ffs_lock(eafe09a4) at ffs_lock+0x76
VOP_LOCK_APV(c0943320,eafe09a4) at VOP_LOCK_APV+0x87
vn_lock(c87b9414,2002,c8bac900,c87b9414) at vn_lock+0xac
vget(c87b9414,2002,c8bac900) at vget+0xc2
vfs_hash_get(c86cf2e4,2,2,c8bac900,eafe0abc,0,0) at vfs_hash_get+0x8d
ffs_vget(c86cf2e4,2,2,eafe0abc) at ffs_vget+0x27
ufs_root(c86cf2e4,2,eafe0b00,c8bac900,0,...) at ufs_root+0x19
lookup(eafe0ba0) at lookup+0x743
namei(eafe0ba0) at namei+0x39a
kern_lstat(c8bac900,bfbfd2a0,0,eafe0c74) at kern_lstat+0x47
lstat(c8bac900,eafe0d04) at lstat+0x1b
syscall(3b,3b,3b,281512fb,bfbfc9f1,...) at syscall+0x25b
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (190, FreeBSD ELF32, lstat), eip = 0x2813d427, esp =
0xbfbfc5ac, ebp = 0xbfbfd268 ---

I was under the impression, that you are not allowed to sleep while
holding a lock in the FreeBSD kernel. Doesn't this also apply to the
lockmgr itself?

Upon shutting down the system, I had a panic coming in:

panic: userret: Returning with 4 locks held.
cpuid = 1
KDB: stack backtrace:
kdb_backtrace(100,c8bac300,c8bac3c8,c8bad218,c8bac300,...) at kdb_backtrace+0x29
panic(c089806f,4,0,c8bac300,c8bad218,...) at panic+0x114
userret(c8bac300,eafd4d38,0,2,0,...) at userret+0x183
syscall(3b,3b,3b,1,0,...) at syscall+0x321
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (0, FreeBSD ELF32, nosys), eip = 0x280bd1b7, esp =
0xbfbfe90c, ebp = 0xbfbfe928 ---
KDB: enter: panic
[thread pid 6926 tid 100106 ]
Stopped at  kdb_enter+0x2b: nop
db bt
Tracing pid 6926 tid 100106 td 0xc8bac300
kdb_enter(c0894aec) at kdb_enter+0x2b
panic(c089806f,4,0,c8bac300,c8bad218,...) at panic+0x127
userret(c8bac300,eafd4d38,0,2,0,...) at userret+0x183
syscall(3b,3b,3b,1,0,...) at syscall+0x321
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (0, FreeBSD ELF32, nosys), eip = 0x280bd1b7, esp =
0xbfbfe90c, ebp = 0xbfbfe928 ---
db show lockedvnods
Locked vnodes

0xc8761c3c: tag ufs, type VDIR
   usecount 1, writecount 0, refcount 1 mountedhere 0xc86cf2e4
   flags ()
lock type ufs: EXCL (count 1) by thread 0xc8bac780 (pid 59934)#0
0xc0668bf9 at lockmgr+0x4ed
#1 0xc078572e at ffs_lock+0x76
#2 0xc0838287 at VOP_LOCK_APV+0x87
#3 0xc06d663c at vn_lock+0xac
#4 0xc06c5eba at dounmount+0x62
#5 0xc06c5e31 at unmount+0x1e5
#6 0xc0825a07 at syscall+0x25b
#7 0xc0811f7f at Xint0x80_syscall+0x1f

   ino 8260, on dev ufs/root

0xc87b9414: tag ufs, type VDIR
   usecount 0, writecount 0, refcount 4 mountedhere 0
   flags (VV_ROOT)
   v_object 0xc8c43c60 ref 0 pages 1
lock type ufs: EXCL (count 4) by thread 0xc8bac300 (pid 6926)
with 1 pending#0 0xc0668bf9 at

Re: kern/92785: Using exported filesystem on OS/2 NFS client causes filesystem freeze

2006-12-15 Thread Ulrich Spoerlein


On 12/15/06, Kostik Belousov [EMAIL PROTECTED] wrote:

This looks like lock leak in nfsd. Could you supply the tcpdump of the
session that causes the problem ? Also, it would be very helpful if you could
note exact rpc that wedges the server.


That would have been my next step. I ran only rpcbind, nfsd and mountd
on the file server (no rpc.lockd/rpc.statd). I then had an OS/2 Client
mount the filesystem, issue a readdir and then tried to mount the same
share from an Linux client. This last mount request never came back,
immediately after issueing the mount request the mountd got stuck in
state 'ufs' as shown in the backtrace.

A tcpdump of the session can be found at:
http://coyote.dnsalias.net/rpc.pcap (9kB)

Uli

PS: Please trim the Email when responding to the GNATS DB as that
makes the PR-Trail rather unreadable. Thanks!
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: kern/92785: Using exported filesystem on OS/2 NFS client causes filesystem freeze

2006-12-15 Thread Ulrich Spoerlein


On 12/15/06, Kostik Belousov [EMAIL PROTECTED] wrote:

Am I right that all you did was ls -l root of nfs mount ? Does OS/2
supports the notion of .. directory ? Could you do just ls -l ..
from nfs client and then try stat root of exported fs on the server
(i think it shall hang) ?


Yes, you are right about the symptoms. We tried the following on the OS/2 Client

mount export
umount export
mount export
umount export

this is all working fine, then we do a dir on the mounted FS

mount i: /export/foo
dir i:
umount  -- haning, as mountd can't process the RPC.


My hypothesis is that LOOKUP RPC for .. causes directory vnode lock
leak in nfs_namei. After that, mountd hang is just consequence.


So, I mounted from the OS/2 Client, ran a dir on the i: drive and then
an stat(1) to the exported partition on the server. This stat would
hang, here's the backtraces:
db ps
 pid  ppid  pgrp   uid   state   wmesg wchancmd
33017 88035 33017 0  S+  ufs  0xc8771880 stat
23627 55476 23627 0  S+  bpf  0xc8e16c00 tcpdump
88035 87505 88035 0  S+  pause0xc882bcc4 tcsh
87505 72558 87505  1000  S+  wait 0xc86f9218 su
72558 89630 72558  1000  Ss+ pause0xc873867c tcsh
21229 1 21229 0  Ss  select   0xc09c10c4 mountd
91293 79042 79042 0  S   -0xc8668200 nfsd
88479 79042 79042 0  S   -0xc8668600 nfsd
86952 79042 79042 0  S   -0xc847cc00 nfsd
83659 79042 79042 0  S   -0xc8678200 nfsd
79042 1 79042 0  Ss  accept   0xc8d649f6 nfsd
55476 52005 55476 0  S+  pause0xc8bcc24c tcsh
52005 95193 52005  1000  S+  wait 0xc8734648 su
...
db show lockedvnods
Locked vnodes

0xc8771828: tag ufs, type VDIR
   usecount 0, writecount 0, refcount 4 mountedhere 0
   flags (VV_ROOT)
   v_object 0xc8a8a084 ref 0 pages 1
lock type ufs: EXCL (count 1) by thread 0xc882f900 (pid 83659)
with 1 pending#0 0xc0668bf9 at lockmgr+0x4ed
#1 0xc078572e at ffs_lock+0x76
#2 0xc0838287 at VOP_LOCK_APV+0x87
#3 0xc06d663c at vn_lock+0xac
#4 0xc06ca4ca at vget+0xc2
#5 0xc06c24a9 at vfs_hash_get+0x8d
#6 0xc07844af at ffs_vget+0x27
#7 0xc078b253 at ufs_lookup+0xa4b
#8 0xc083641b at VOP_CACHEDLOOKUP_APV+0x9b
#9 0xc06bf499 at vfs_cache_lookup+0xb5
#10 0xc0836347 at VOP_LOOKUP_APV+0x87
#11 0xc06c3626 at lookup+0x46e
#12 0xc0734fba at nfs_namei+0x40e
#13 0xc0726d81 at nfsrv_lookup+0x1dd
#14 0xc0736765 at nfssvc_nfsd+0x3d9
#15 0xc07360b4 at nfssvc+0x18c
#16 0xc0825a07 at syscall+0x25b
#17 0xc0811f7f at Xint0x80_syscall+0x1f

   ino 2, on dev da1s2e
db tr 33017
Tracing pid 33017 tid 100125 td 0xc86fd600
sched_switch(c86fd600,0,1) at sched_switch+0x177
mi_switch(1,0) at mi_switch+0x270
sleepq_switch(c8771880,c0973440,0,c089798c,211,...) at sleepq_switch+0xc1
sleepq_wait(c8771880,0,c87718f0,b7,c08929b8,...) at sleepq_wait+0x46
msleep(c8771880,c0972590,50,c089c1c1,0,...) at msleep+0x279
acquire(eb01694c,40,6,c86fd600,0,...) at acquire+0x76
lockmgr(c8771880,2002,c87718f0,c86fd600) at lockmgr+0x44e
ffs_lock(eb0169a4) at ffs_lock+0x76
VOP_LOCK_APV(c0943320,eb0169a4) at VOP_LOCK_APV+0x87
vn_lock(c8771828,2002,c86fd600,c8771828) at vn_lock+0xac
vget(c8771828,2002,c86fd600) at vget+0xc2
vfs_hash_get(c87115c8,2,2,c86fd600,eb016abc,0,0) at vfs_hash_get+0x8d
ffs_vget(c87115c8,2,2,eb016abc) at ffs_vget+0x27
ufs_root(c87115c8,2,eb016b00,c86fd600,0,...) at ufs_root+0x19
lookup(eb016ba0) at lookup+0x743
namei(eb016ba0) at namei+0x39a
kern_lstat(c86fd600,bfbfed99,0,eb016c74) at kern_lstat+0x47
lstat(c86fd600,eb016d04) at lstat+0x1b
syscall(3b,3b,3b,0,bfbfebf0,...) at syscall+0x25b
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (190, FreeBSD ELF32, lstat), eip = 0x2812d427, esp =
0xbfbfeb9c, ebp = 0xbfbfec68 ---
db tr 83659
Tracing pid 83659 tid 100115 td 0xc882f900
sched_switch(c882f900,0,1) at sched_switch+0x177
mi_switch(1,0) at mi_switch+0x270
sleepq_switch(c8678200) at sleepq_switch+0xc1
sleepq_wait_sig(c8678200) at sleepq_wait_sig+0x1d
msleep(c8678200,c09c9f00,158,c088bec9,0,...) at msleep+0x26a
nfssvc_nfsd(c882f900) at nfssvc_nfsd+0xe5
nfssvc(c882f900,eaf8ad04) at nfssvc+0x18c
syscall(3b,3b,3b,1,0,...) at syscall+0x25b
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (155, FreeBSD ELF32, nfssvc), eip = 0x280bd1b7, esp =
0xbfbfe90c, ebp = 0xbfbfe928 ---

Do you think you can fix it? Any idea why this seems to only happen
with OS/2 Clients?

Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: ggate still broken on 6.2-RC1 for amd64.

2006-12-11 Thread Ulrich Spoerlein

Craig Boston wrote:
 Have you tried increasing the send/receive buffer size?  In my local
 ggate setup I'm running both the client and server with the options
 -R 196608 -S 196608.  I added it a while back after discovering that
 the default buffer size was inadequate in certain situations and would
 sometimes cause large block sized I/O to hang.

Heh, this is funny. I have reports from another source, who _decreases_
bufsize to 8kB, because that is giving him the most performance.

Since I'm using HPS' USB stack I can't use my uplcom device and
therefore cannot usefully test some more ggate/gmirror scenarios on
-CURRENT ...

But I'll whip up a ggate test case.

Ulrich Spoerlein
-- 
A: Yes.
Q: Are you sure?
 A: Because it reverses the logical flow of conversation.
 Q: Why is top posting frowned upon?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: ggate still broken on 6.2-RC1 for amd64.

2006-12-11 Thread Ulrich Spoerlein

Ulrich Spoerlein wrote:
 But I'll whip up a ggate test case.

Very strange ... I thought I would work through different buffer sizes,
starting with some low value. Here's what gives:

igor# ggated -a localhost -v -R8k -S8k /tmp/ggate_exports   igor# 
ggatec create -v -R8k -S8k localhost /tmp/ggate_test
info: Reading exports file (/tmp/ggate_exports).info: 
Connected to the server: localhost:3080.
debug: Added 127.0.0.1/32 /tmp/ggate_test RW to exports list.   debug: 
Sending version packet.
info: Exporting 1 object(s).
info: Listen on port: 3080.
info: Connection from: 127.0.0.1.
debug: Receiving version packet.
debug: Version packet received.
debug: Receiving initial packet.

VERY LONG PAUSE

debug: Initial packet received. debug: 
Sending initial packet.
debug: Connection created [127.0.0.1, /tmp/ggate_test]. debug: 
Receiving initial packet.
debug: New connection created (token=226910802).debug: 
Received initial packet.
debug: Sending initial packet.  info: 
Connected to the server: localhost:3080.
debug: 
Sending version packet.
VERY LONG PAUSE

g_gate_send: EAGAIN 
g_gate_send: EAGAIN
g_gate_send: EAGAIN 
g_gate_send: EAGAIN
info: Connection from: 127.0.0.1.   ^C
debug: Receiving version packet.
^C

Now try with 16k.

igor# ggated -a localhost -v -R16k -S16k /tmp/ggate_exports igor# 
ggatec create -v -R16k -S16k localhost /tmp/ggate_test
info: Reading exports file (/tmp/ggate_exports).info: 
Connected to the server: localhost:3080.
debug: Added 127.0.0.1/32 /tmp/ggate_test RW to exports list.   debug: 
Sending version packet.
info: Exporting 1 object(s).
info: Listen on port: 3080.
info: Connection from: 127.0.0.1.
debug: Receiving version packet.
debug: Version packet received.
debug: Receiving initial packet.

LONG PAUSE

debug: Initial packet received. debug: 
Sending initial packet.
debug: Connection created [127.0.0.1, /tmp/ggate_test]. debug: 
Receiving initial packet.
debug: New connection created (token=2294332471).   debug: 
Received initial packet.
debug: Sending initial packet.  info: 
Connected to the server: localhost:3080.
info: Connection from: 127.0.0.1.   debug: 
Sending version packet.
debug: Receiving version packet.
debug: Version packet received.
debug: Receiving initial packet.

LONG PAUSE

debug: Initial packet received. debug: 
Sending initial packet.
debug: Found existing connection (token=2294332471).debug: 
Receiving initial packet.
debug: Connection added [127.0.0.1, /tmp/ggate_test].   debug: 
Received initial packet.
debug: Sending initial packet.  ggate5
debug: Connection removed [127.0.0.1 /tmp/ggate_test].  notice: 
send_thread: started!
debug: Process created [/tmp/ggate_test].   notice: 
recv_thread: started!
notice: disk_thread: started [/tmp/ggate_test]!
notice: send_thread: started [/tmp/ggate_test]!
notice: recv_thread: started [/tmp/ggate_test]!
debug: Process 1140 exiting.
^C


I wanted to use something like the following, for first draft of a
benchmark, but I just I/O deadlocked the system (6.2 and CURRENT).
Simply by running ggated/ggatec in various combinations.

db show alllocks
Process 1333 (ggatel) thread 0xc2767510 (100081)
exclusive sx sysctl lock r = 0 (0xc078c420) locked @ 
/vol/src/sys/kern/kern_sysctl.c:1376
db trace 1333
Tracing pid 1333 tid 100081 td 0xc2767510
sched_switch(c2767510,0,1) at sched_switch+0xe7
mi_switch(1,0) at mi_switch+0x27c
sleepq_switch(c2b3e680,c078bdd0,0,c070e413,236,...) at sleepq_switch+0xc9
sleepq_timedwait(c2b3e680) at sleepq_timedwait+0x4a
msleep(c2b3e680,0,4c,c07028f3,64) at msleep+0x281
g_waitfor_event(c050d120,c2b6c300,2,0,0,0,0,1) at g_waitfor_event+0x73
sysctl_kern_geom_confxml(c07485e0,0,0,d1781b9c,c07485e0,...) at 
sysctl_kern_geom_confxml+0x26
sysctl_root(0,d1781c1c,3,d1781b9c) at sysctl_root+0x12f
userland_sysctl(c2767510,d1781c1c,3,830,bfbfe3d8,0,0,0,d1781c18,0,c078bde8,0,c070bc1f,522)
 at userland_sysctl+0xf4
__sysctl(c2767510,d1781d04) at __sysctl+0x77
syscall(3b,3b,3b,3,bfbfe3d8,...) at syscall+0x27e
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (202, FreeBSD ELF32, __sysctl), eip = 0x2816ba7f, esp = 0xbfbfe2bc, 
ebp = 0xbfbfe2f8 ---
db ps
  pid  ppid  pgrp   uid   state   wmesg wchancmd
 1348   800   800 0  S   sysctl l 0xc078c444 cron
 1347   800   800 0  S   sysctl l 0xc078c444 cron
 1346   800   800 0  S   sysctl l 0xc078c444 cron
 1345

Re: ggate still broken on 6.2-RC1 for amd64.

2006-12-05 Thread Ulrich Spoerlein

David Gilbert wrote:
 GGate is still broken on 6.2-RC1 for amd64.
 
 I have verified that the patch in kern/104829 has been applied (it's
 in the tree).
 
 I have also added the patch in amd64/91799 --- without it, ggated
 doesn't work at all.  This should definately make it into 6.2
 
 But the ggated/ggatec in 6.2-RC1 connects now (and is happy about
 that).  In fact, the tasting on the ggatec side that happens due to
 new disks showing up works, too.  However, any attempt to pass
 significant traffic causes ggatec to seeminly lock up.
 
 In my configuration, I have a gmirror running with a local disk
 (already) and I want to gmirror insert the ggate disk.  When I do
 so, I get 50 write requests queued (I upped the gmirror buffer count
 to 50 to make syncronization happen faster) and things never move from
 there.

/me too. Though I tested this on two FreeBSD/i386 SMP machines with
gmirror + ggated combination. There *is* traffic going on, but it is
somewhere around 50kB/s (sic! no kidding!).

Also, forcefully removing the ggate0 provider (ggatec destroy -fu0),
which should not impact the mirror operation in any way, panic'ed the
system.

I can't rebuild this test scenario on -CURRENT right now, but will do so
time permitting. Maybe this is related to the gmirror deadlock I
reported. But I no longer have SMP hardware to play with ...

Ulrich Spoerlein
-- 
A: Yes.
Q: Are you sure?
 A: Because it reverses the logical flow of conversation.
 Q: Why is top posting frowned upon?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: gmirror and quota corruption

2006-12-03 Thread Ulrich Spoerlein

Jason Vance wrote:
 I have a FreeBSD 5.5-STABLE box that is setup with a gmirror RAID 1 using
 two identical harddrives.
 ...
 The system boots up but as soon as I do any disk access ie 'repquota -a' or
 write a file to the harddrive, the system hangs. I can still connect to the
 various services via telnet to their port, but none of them respond.
 ...
 Is there a known conflict between gmirror and a quota enabled filesystem?
 How can I properly set these up?

Could you please re-test this setup with a kernel *without* option
PREEMTION and share your results? Also, is this a UP or SMP machine?

Ulrich Spoerlein
-- 
A: Yes.
Q: Are you sure?
 A: Because it reverses the logical flow of conversation.
 Q: Why is top posting frowned upon?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

geom/gstat diplay bug?

2006-11-23 Thread Ulrich Spoerlein


Hi all,

one of our servers running FreeBSD 5.5 was seriously swapping (1.9GB
of 2GB swap used) and to see the performance of the ad0s1b device, I
fired up gstat. This is the current output (it has stopped swapping)

dT: 0.510  flag_I 50us  sizeof 240  i -1
L(q)  ops/sr/s   kBps   ms/rw/s   kBps   ms/w   %busy Name
   0 31  0  00.0 312550.41.3| ad0
   0 31  0  00.0 312550.41.3| ad0s1
   1 49  0  00.0 49   62744.5   22.7| ad2
   0  0  0  00.0  0  00.00.0| ad0s1a
4294967287  0  0  00.0  0  00.00.0| ad0s1b
   0  0  0  00.0  0  00.00.0| ad0s1c
   0  0  0  00.0  0  00.00.0| ad0s1d
   0 31  0  00.0 312550.51.4| ad0s1e
...

There are two possible explanations, AFAICT:

a) This is a dual CPU machine, so the L(q)++ and L(q)-- operations
were not strictly atomic, causing  the counter to go -1.

b) or, the L(q) is computed by some addition/multiplication (doubtful)
and since the queue length was very, very long we got a integer
overflow.

Interesting thing is, that gstate decodes the queue length as an uint64_t value.

Ah, I see now, that L(q) is computed by end_count - start_count of
struct devstat. Of course, I had lots of swap_pager_getswapspace(9):
failed errors on the console, as the system was running out of swap
space. Are these transactions somehow counted wrong?

Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

systat -vm output showing negative total virtual memory

2006-11-16 Thread Ulrich Spoerlein


Hi all,

this is on a two week old RELENG_6. The machine has 4GB RAM,  SMP

CPU: Intel(R) Xeon(TM) CPU 3.00GHz (3012.12-MHz 686-class CPU)
 Origin = GenuineIntel  Id = 0xf43  Stepping = 3
Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUS
H,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
 Features2=0x641dSSE3,RSVD2,MON,DS_CPL,CNTX-ID,CX16,b14
 AMD Features=0x2010NX,LM
real memory  = 3489071104 (3327 MB)
avail memory = 3414265856 (3256 MB)
ACPI APIC Table: PTLTD  APIC  
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
cpu0 (BSP): APIC ID:  0
cpu1 (AP): APIC ID:  6


Mem:KBREALVIRTUAL VN PAGER  SWAP PAGER
   Tot   Share  TotShareFree in  out in  out
Act 1198620  115040  1480676   289860  153004 count
All 3330652  116920 -956751k   293960 pages

vm.vmtotal has this to say
System wide totals computed every five seconds: (values in kilobytes)
===
Processes:  (RUNQ: 3 Disk Wait: 1 Page Wait: 1 Sleep: 40)
Virtual Memory: (Total: 815944K, Active 355288K)
Real Memory:(Total: 2558540K Active 150424K)
Shared Virtual Memory:  (Total: 11460K Active: 7856K)
Shared Real Memory: (Total: 6916K Active: 5044K)
Free Memory Pages:  890092K

Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: systat -vm output showing negative total virtual memory

2006-11-16 Thread Ulrich Spoerlein

Ruslan Ermilov wrote:
 sysctl(8) knows that t_vm is in bytes, but for the other stats
 it thinks they are in pages.  systat -vm thinks they are all
 in bytes.  Here's a fix:

Thanks!, I applied your patch to RELENG_6


# sysctl vm.vmtotal ; ./sysctl vm.vmtotal
vm.vmtotal: 
System wide totals computed every five seconds: (values in kilobytes)
===
Processes:  (RUNQ: 1 Disk Wait: 0 Page Wait: 0 Sleep: 45)
Virtual Memory: (Total: 797461K, Active 92512K)
Real Memory:(Total: 3327992K Active 48124K)
Shared Virtual Memory:  (Total: 11856K Active: 7772K)
Shared Real Memory: (Total: 7644K Active: 5364K)
Free Memory Pages:  145964K

vm.vmtotal: 
System wide totals computed every five seconds: (values in kilobytes)
===
Processes:  (RUNQ: 1 Disk Wait: 0 Page Wait: 0 Sleep: 45)
Virtual Memory: (Total: 797461K, Active 22K)
Real Memory:(Total: 3327992K Active 48128K)
Shared Virtual Memory:  (Total: 2K Active: 1K)
Shared Real Memory: (Total: 7644K Active: 5364K)
Free Memory Pages:  145964K


22K active VM and 1K shared? Seems pretty low to me...

Here's the systat -vm output

Mem:KBREALVIRTUAL  
Tot   Share  TotShareFree  
Act   48384542492800 7844  145692 count
All 33282647704-1028565k11928 pages


Mem:KBREALVIRTUAL  
Tot   Share  TotShareFree  
Act   484645372   221  145692 count
All 33282647652   7974612 pages


The total value seems more sane, but I doubt the active total value.
top(1) says

106 processes: 3 running, 80 sleeping, 1 zombie, 22 waiting
CPU states:  8.9% user,  0.0% nice, 11.4% system,  0.8% interrupt, 78.9% idle
Mem: 48M Active, 2834M Inact, 239M Wired, 133M Cache, 112M Buf, 4680K Free
Swap: 1024M Total, 36K Used, 1024M Free

Yes, the system is totally idle, that may explain the values above. If
your fix is correct (sorry, but I'm not in a position to judge your
work), would it be possible to have a quick MFC to RELENG_6 and
RELENG_6_2?


Ulrich Spoerlein
-- 
A: Yes.
Q: Are you sure?
 A: Because it reverses the logical flow of conversation.
 Q: Why is top posting frowned upon?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

ntpd vs nss_ldap: Crashing in getaddrinfo

2006-11-15 Thread Ulrich Spoerlein


Hi,

I needed to test the ntpd from ports (net/ntp, net/ntp-devel,
net/ntp-stable), but they always crashed with a SIGBUS error.
Investigation lead to nss_ldap being the culprit.

With nss_ldap installed and NO keyword ldap in /etc/nsswitch.conf,
ntpd will run fine. If you either add ldap to passwd or group or
both, ntpd will crash calling gethostaddr (even though LDAP is only
used for passwd/group)

/etc/nsswitch.conf:
group: files ldap
hosts: files dns
networks: files
passwd: files ldap
shells: files


[EMAIL PROTECTED]:/usr/ports/net/ntp-stable/work/ntp-4.2.2p4-RC4/ntpd# gdb 
./ntpd
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-marcel-freebsd...
(gdb) r -d
Starting program: /usr/ports/net/ntp-stable/work/ntp-4.2.2p4-RC4/ntpd/ntpd -d
ntpd [EMAIL PROTECTED] Wed Nov 15 09:56:13 UTC 2006 (1)
addto_syslog: precision = 1.117 usec
create_sockets(123)
addto_syslog: no IPv6 interfaces found
addto_syslog: ntp_io: estimated max descriptors: 10951, initial socket
boundary: 20
bind() fd 20, family 2, port 123, addr 0.0.0.0, flags=9
Added addr 0.0.0.0 to list of addresses
addto_syslog: Listening on interface wildcard, 0.0.0.0#123 Disabled
bind() fd 21, family 2, port 123, addr 16.30.58.127, flags=25
Added addr 16.30.58.127 to list of addresses
addto_syslog: Listening on interface xl0, 16.30.58.127#123 Enabled
bind() fd 22, family 2, port 123, addr 127.0.0.1, flags=21
Added addr 127.0.0.1 to list of addresses
addto_syslog: Listening on interface lo0, 127.0.0.1#123 Enabled
init_io: maxactivefd 22
local_clock: time 0 base 0.00 offset 0.00 freq 0.000 state 0

Program received signal SIGBUS, Bus error.
0x280a98c8 in memset () from /libexec/ld-elf.so.1
(gdb) bt
#0  0x280a98c8 in memset () from /libexec/ld-elf.so.1
#1  0x280c2100 in ?? ()
#2  0x2809f039 in map_object () from /libexec/ld-elf.so.1
#3  0x2809c115 in elf_hash () from /libexec/ld-elf.so.1
#4  0x2809c21c in elf_hash () from /libexec/ld-elf.so.1
#5  0x2809de8c in dlopen () from /libexec/ld-elf.so.1
#6  0x2828140c in _nsdbtaddsrc () from /lib/libc.so.6
#7  0x2827cb92 in ___toupper () from /lib/libc.so.6
#8  0x2827d1b4 in _nsyyparse () from /lib/libc.so.6
#9  0x2828179e in nsdispatch () from /lib/libc.so.6
#10 0x28271776 in getaddrinfo () from /lib/libc.so.6
#11 0x0804bfee in getnetnum (num=0xbfbfe537 ntp0..com,
addr=0xbfbfe9d0, complain=0, a_type=t_UNK) at ntp_config.c:
#12 0x0804cb5f in getconfig (argc=2, argv=0xbfbfebcc) at ntp_config.c:652
#13 0x0805246e in ntpdmain (argc=2, argv=0xbfbfebcc) at ntpd.c:744
#14 0x080527bb in main (argc=2, argv=0xbfbfebcc) at ntpd.c:274
(gdb) f 11
#11 0x0804bfee in getnetnum (num=0xbfbfe537 ntp0..com,
addr=0xbfbfe9d0, complain=0, a_type=t_UNK) at ntp_config.c:
retval = getaddrinfo(num, ntp, hints, ptr);
(gdb) l
2217hints.ai_socktype = SOCK_DGRAM;
2218#ifdef DEBUG
2219if (debug  3)
2220printf(getaddrinfo %s\n, num);
2221#endif
retval = getaddrinfo(num, ntp, hints, ptr);
2223if (retval != 0 ||
2224   (ptr-ai_family == AF_INET6  isc_net_probeipv6()
!= ISC_R_SUCCESS)) {
2225if (complain)
2226msyslog(LOG_ERR,
(gdb)

What's happening?

Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

dump(8): how many bytes written to tape?

2006-11-15 Thread Ulrich Spoerlein


Hi,

I'm trying to figure out how much bytes were written to a tape by
dump(8). I'm using a blocksize of 64kB to maximize throughput to the
tape drive. Initially, I thought I could just add up the number of
tape blocks written by dump and multiply by 64kB. But it looks like
dump is still reporting those values as 1kB blocks.

Here's some sample output:

 DUMP: Date of this level 1 dump: Wed Nov 15 09:46:37 2006
 DUMP: Date of last level 0 dump: the epoch
 DUMP: Cache 256 MB, blocksize = 65536
 DUMP: DUMP: 30676 tape blocks on 1 volume
 DUMP: finished in 1 seconds, throughput 30676 KBytes/sec

 DUMP: Date of this level 1 dump: Wed Nov 15 10:25:38 2006
 DUMP: Date of last level 0 dump: the epoch
 DUMP: DUMP: 4650864 tape blocks on 1 volume
 DUMP: finished in 132 seconds, throughput 35233 KBytes/sec

 DUMP: Date of this level 1 dump: Wed Nov 15 10:50:36 2006
 DUMP: Date of last level 0 dump: the epoch
 DUMP: DUMP: 328548 tape blocks on 1 volume
 DUMP: finished in 14 seconds, throughput 23467 KBytes/sec

 DUMP: Date of this level 1 dump: Wed Nov 15 11:00:14 2006
 DUMP: Date of last level 0 dump: the epoch
 DUMP: DUMP: 36925423 tape blocks on 1 volume
 DUMP: finished in 973 seconds, throughput 37950 KBytes/sec

If I add the time*throughput, I get 41GB. If I add the number of tape
blocks and assume a block size of 1kB, I get 41GB, too.

So, how exactly is the '-b64' parameter to dump(8) affecting the block
size on tape?

Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: problems with shutdown after dump on a large partition

2006-11-04 Thread Ulrich Spoerlein

Anatoliy Dmytriyev wrote:
 I got problems with shutdown after dump with ???-L??? (with spashots) on a 
 large partition:
 
 We have large partition with 872G on ???df ???H??? report. Exactly before 
 shutdown the 
 ???dump ???Lau??? was finished without any problems. After dump finished I 
 run command 
 ???shutdown ???h now??? and in the result shutdown was incorrect because disk 
 sync was 
 terminated by timeout and fsck was run on the next boot.

I'm not entirely shure, but this looks like the snapshot generated by
dump -L was not yet cleaned up. You should wait a couple of minutes
(depending on the snapshot size and I/O turnover) before shutting down
the system or umounting the partition.

I don't know of a way to decide if the snapshot has been fully cleaned
up.

Ulrich Spoerlein
-- 
A: Yes.
Q: Are you sure?
 A: Because it reverses the logical flow of conversation.
 Q: Why is top posting frowned upon?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: RELENG_6: I/O deadlock under load

2006-11-03 Thread Ulrich Spoerlein


On 10/28/06, Christian S.J. Peron [EMAIL PROTECTED] wrote:


It almost looks as if a user frequently runs gmirror(8) to query the
status of their array. Under a high load situation, the worker is busy,
so at one un-lucky momment, gmirror(8) is run:

(1) gmirror(8) waits for sc-sc_lock owned by the worker
(2) The worker then drops the lock
(3) gmirror(8) proceeds
(4) Worker wakes up and waits for sc-sc_lock
(5) Only gmirror  never will because it's waiting on a resource
(presumably owned by the worker thread)?

I am not certain this is correct, so I have included pjd in the CC loop,
hoping he can help shed some light on the subject :)


This is just a followup to report that the problem seems
unreproducable on an identical kernel if I leave out option
PREEMPTION.
Performance sucks that way, but at least it's stable now.

Pawel seems to be rather busy with his GJOURNAL work and his ZFS port,
is someone else able to reproduce the problem?

Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: panic: vfs_getopt: caller passed 'opts' as NULL

2006-11-01 Thread Ulrich Spoerlein


On 10/31/06, Kris Kennaway [EMAIL PROTECTED] wrote:

Note that they'll be demand-loaded if requested (e.g. if you try to
mount_nullfs).  Maybe you or something else tried to mount such a
filesystem by accident?

 But the point is mood anyway, since I could not reproduce the problem.
 I tried again after rebooting the machine and everything went just
 fine ...

 I have to use the nullfs mounts on another machine shortly, let's see
 what happens there.


It reliably paniced in single user mode, with no other modules loaded
at the time. But, I see now that nullfs.ko is loaded as a module,
which might explain everything. I assumed it was built in.

I rebooted to a kernel without DEBUG_VFS_LOCKS and it's happily using
nullfs. I'll try once more with a debugging kernel, that has nullfs
built in, but I'll guess the panic vanishes.


Ok, with the attached kernel config, which includes nullfs, I get a
duplicate lock, instead of a panic
Trying to mount root from ufs:/dev/da0s1a
acquiring duplicate lock of same type: vnode interlock
1st vnode interlock @ /usr/src/sys/kern/vfs_vnops.c:806
2nd vnode interlock @ /usr/src/sys/kern/vfs_subr.c:2036
KDB: stack backtrace:
kdb_backtrace(3,c894fa80,c0a47110,c0a47110,c09cb524,...) at kdb_backtrace+0x29
witness_checkorder(c8622d04,9,c0951b38,7f4) at witness_checkorder+0x578
_mtx_lock_flags(c8622d04,0,c0951b38,7f4,c840b590,...) at _mtx_lock_flags+0x78
vrefcnt(c8622c3c) at vrefcnt+0x20
null_checkvp(c8a7ed98,c093f5ae,215) at null_checkvp+0x56
null_lock(eb0bba80) at null_lock+0x66
VOP_LOCK_APV(c09c40a0,eb0bba80) at VOP_LOCK_APV+0x87
vn_lock(c8a7ed98,1002,c894fa80,c8a7ed98,c8a89224,...) at vn_lock+0xac
nullfs_root(c88052e4,2,eb0bbaf8,c894fa80,0,8,0,c0a84040,0,c09513da,3dd)
at nullfs_root+0x26
vfs_domount(c894fa80,c83e64c0,c8493490,0,c83fdad0,c0a38380,0,c09513da,2a3)
at vfs_domount+0x975
vfs_donmount(c894fa80,0,c87f4e00,c87f4e00,0,...) at vfs_donmount+0x2ef
nmount(c894fa80,eb0bbd04) at nmount+0x8b
syscall(3b,3b,3b,bfbfe435,bfbfecc8,...) at syscall+0x25b
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (378, FreeBSD ELF32, nmount), eip = 0x280ba4d7, esp =
0xbfbfe3fc, ebp = 0xbfbfec78 ---


I grepped /sys for DEBUG_VFS_LOCKS and it seems to only add some
additional KASSERTs, but not the one which triggered in the original
panic.

Nullfs seems more fragile than I initially thought ...

Uli


DEBUG
Description: Binary data
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: panic: vfs_getopt: caller passed 'opts' as NULL

2006-11-01 Thread Ulrich Spoerlein

Kris Kennaway wrote:
  Nullfs seems more fragile than I initially thought ...
 
 It's just that compiling in the extra debugging (it might be
 DEBUG_LOCKS or DEBUG_VFS_LOCKS, I forget which), causes the sizes of
 structures to change, so when the module tries to fondle the structure
 at a certain offset thinking it's accessing a certain field, it's
 really fondling something else entirely and the kernel gets a nasty
 surprise and panics.

It is DEBUG_LOCKS. The DEBUG_VFS_LOCKS macro only enables additional
code at runtime, it does not alter the ABI. Ironically, it is even
documented in conf/NOTES.

For the future, I have to remember that nullfs is a module.

Ulrich Spoerlein
-- 
A: Yes.
Q: Are you sure?
 A: Because it reverses the logical flow of conversation.
 Q: Why is top posting frowned upon?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

panic: vfs_getopt: caller passed 'opts' as NULL

2006-10-30 Thread Ulrich Spoerlein


RELENG_6 from 30th October, trying to do two nullfs mounts from two
amd-mounted directories (i.e., NFS mounts).

Funny thing is, this amd/nfs/mount_nullfs is working on several other
machines from a RELENG_6 checkout of 25th October.

panic: vfs_getopt: caller passed 'opts' as NULL
cpuid = 1
KDB: stack backtrace:
kdb_backtrace(100,c8506780,c852c870,c8df3450,e8d0ca5c,...) at kdb_backtrace+0x29
panic(c089c395,c852c870,c8721b90,e8d0ca80,e8d0cadc,...) at panic+0x114
vfs_getopt(0,c8df3450,e8d0ca58,e8d0ca5c,0,...) at vfs_getopt+0x1d
nullfs_mount(c8721b90,c8506780,0,c8df46c0,c8cd1c3c,...) at nullfs_mount+0x70
vfs_domount(c8506780,c852c870,c8433a40,0,c851cc50,c0971700,0,c089be7a,2a3)
at vfs_domount+0x687
vfs_donmount(c8506780,0,c86ffd00,c86ffd00,0,...) at vfs_donmount+0x2ef
nmount(c8506780,e8d0cd04) at nmount+0x8b
syscall(3b,3b,3b,bfbfe3b4,bfbfec0c,...) at syscall+0x25b
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (378, FreeBSD ELF32, nmount), eip = 0x280ba4d7, esp =
0xbfbfe33c, ebp = 0xbfbfebb8 ---
KDB: enter: panic
[thread pid 60225 tid 100085 ]
Stopped at  kdb_enter+0x2b: nop
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Fwd: panic: vfs_getopt: caller passed 'opts' as NULL

2006-10-30 Thread Ulrich Spoerlein


On 10/30/06, Kris Kennaway [EMAIL PROTECTED] wrote:

 panic: vfs_getopt: caller passed 'opts' as NULL

This can happen if you are using filesystem modules but your kernel is
built with nonstandard options (DEBUG_*_LOCKS is a culprit, I think).


Interesting, but no filesystem modules were involved. Infact, even
geom_mirror and geom_label were statically built into the kernel.

But the point is mood anyway, since I could not reproduce the problem.
I tried again after rebooting the machine and everything went just
fine ...

I have to use the nullfs mounts on another machine shortly, let's see
what happens there.

Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: RELENG_6: I/O deadlock under load

2006-10-28 Thread Ulrich Spoerlein

Ulrich Spoerlein wrote:
 Our fileserver deadlocked, again. It is running RELENG_6 checked out
 yesterday. I have enabled DDB, WITNESS and INVARIANTS and have it
 hooked up via serial console.

Happend again, now I have DEBUG_LOCKS and DEBUG_VFS_LOCK included. There
are hundreds of cron processes waiting on wmesg 'sysctl' (they seem to
have piled up prior to me entering the debugger).

db show pcpu
cpuid= 0
curthread= 0xc8326780: pid 11 idle: cpu0
curpcb   = 0xe6f1fd90
fpcurthread  = none
idlethread   = 0xc8326780: pid 11 idle: cpu0
APIC ID  = 0
currentldt   = 0x50
spin locks held:
db show allpcpu
Current CPU: 0

cpuid= 0
curthread= 0xc8326780: pid 11 idle: cpu0
curpcb   = 0xe6f1fd90
fpcurthread  = none
idlethread   = 0xc8326780: pid 11 idle: cpu0
APIC ID  = 0
currentldt   = 0x50
spin locks held:

cpuid= 1
curthread= 0xc8326600: pid 10 idle: cpu1
curpcb   = 0xe6f1cd90
fpcurthread  = none
idlethread   = 0xc8326600: pid 10 idle: cpu1
APIC ID  = 6
currentldt   = 0x50
spin locks held:

db show alllocks
Process 60935 (gmirror) thread 0xc88ce780 (100122)
exclusive sx sysctl lock r = 0 (0xc0971dc0) locked @ 
/usr/src/sys/kern/kern_sysctl.c:1375
Process 50 (g_mirror gm0) thread 0xc86b7600 (100062)
exclusive sx gmirror:lock r = 0 (0xc84b282c) locked @ 
/usr/src/sys/geom/mirror/g_mirror.c:1809

'gm0' is the mirror where the OS resides on. It is 8GB in size and spans
across da0s1 and da1s1 which are RAID5 volumes attached through two
twa(4) controllers.

db show lockedvnods
Locked vnodes

0xcb4a4984: tag ufs, type VREG
usecount 1, writecount 0, refcount 3 mountedhere 0
flags ()
v_object 0xcc804e70 ref 0 pages 1
 lock type ufs: SHARED (count 1)#0 0xc0667314 at lockmgr+0x160
#1 0xc0783fea at ffs_lock+0x76
#2 0xc083688f at VOP_LOCK_APV+0x87
#3 0xc06d50b8 at vn_lock+0xac
#4 0xc06d478e at vn_read+0x132
#5 0xc0697a89 at dofileread+0x85
#6 0xc0697922 at kern_readv+0x36
#7 0xc069784d at read+0x45
#8 0xc0824037 at syscall+0x25b
#9 0xc08106af at Xint0x80_syscall+0x1f

ino 8315, on dev ufs/root

0xc87682b8: tag ufs, type VDIR
usecount 1, writecount 0, refcount 4 mountedhere 0
flags ()
v_object 0xcb4b6630 ref 0 pages 1
 lock type ufs: EXCL (count 1) by thread 0xc850b000 (pid 43987)#0 
0xc06676a1 at lockmgr+0x4ed
#1 0xc0783fea at ffs_lock+0x76
#2 0xc083688f at VOP_LOCK_APV+0x87
#3 0xc06d50b8 at vn_lock+0xac
#4 0xc06c8f46 at vget+0xc2
#5 0xc06bd9be at cache_lookup+0x34a
#6 0xc06bdef2 at vfs_cache_lookup+0x92
#7 0xc083494f at VOP_LOOKUP_APV+0x87
#8 0xc06c20a2 at lookup+0x46e
#9 0xc06c19b6 at namei+0x39a
#10 0xc06d3e9f at vn_open_cred+0x5b
#11 0xc06d3e42 at vn_open+0x1e
#12 0xc06cd342 at kern_open+0xb6
#13 0xc06cd256 at open+0x1a
#14 0xc0824037 at syscall+0x25b
#15 0xc08106af at Xint0x80_syscall+0x1f

ino 94210, on dev ufs/var

0xc87746cc: tag ufs, type VREG
usecount 1, writecount 1, refcount 3 mountedhere 0
flags ()
v_object 0xc876a210 ref 0 pages 3
 lock type ufs: EXCL (count 1) by thread 0xc86b7000 (pid 14753)#0 
0xc06676a1 at lockmgr+0x4ed
#1 0xc0783fea at ffs_lock+0x76
#2 0xc083688f at VOP_LOCK_APV+0x87
#3 0xc06d50b8 at vn_lock+0xac
#4 0xc06d4a54 at vn_write+0x138
#5 0xc0697d5f at dofilewrite+0x77
#6 0xc0697c03 at kern_writev+0x3b
#7 0xc0697bac at writev+0x30
#8 0xc0824037 at syscall+0x25b
#9 0xc08106af at Xint0x80_syscall+0x1f

ino 94280, on dev ufs/var

0xca357414: tag ufs, type VDIR
usecount 1, writecount 0, refcount 2 mountedhere 0
flags ()
 lock type ufs: EXCL (count 1) by thread 0xc8cdf480 (pid 20101)#0 
0xc06676a1 at lockmgr+0x4ed
#1 0xc0783fea at ffs_lock+0x76
#2 0xc083688f at VOP_LOCK_APV+0x87
#3 0xc06d50b8 at vn_lock+0xac
#4 0xc06c8f46 at vget+0xc2
#5 0xc06bd9be at cache_lookup+0x34a
#6 0xc06bdef2 at vfs_cache_lookup+0x92
#7 0xc083494f at VOP_LOOKUP_APV+0x87
#8 0xc06c20a2 at lookup+0x46e
#9 0xc06c19b6 at namei+0x39a
#10 0xc06cf3f1 at kern_stat+0x35
#11 0xc06cf39f at stat+0x1b
#12 0xc0824037 at syscall+0x25b
#13 0xc08106af at Xint0x80_syscall+0x1f

ino 94211, on dev ufs/var

0xc875c15c: tag syncer, type VNON
usecount 1, writecount 0, refcount 2 mountedhere 0
flags ()
 lock type syncer: EXCL (count 1) by thread 0xc84ce480 (pid 46)#0 
0xc06676a1 at lockmgr+0x4ed
#1 0xc06c00e1 at vop_stdlock+0x21
#2 0xc083688f at VOP_LOCK_APV+0x87
#3 0xc06d50b8 at vn_lock+0xac
#4 0xc06c8703 at sync_vnode+0xe3
#5 0xc06c89a1 at sched_sync+0x1ed
#6 0xc065e864 at fork_exit+0xa0
#7 0xc08106bc at fork_trampoline+0x8


0xc8771d98: tag ufs, type VREG
usecount 3, writecount 0, refcount 4 mountedhere 0
flags (VV_TEXT)
v_object 0xc88ddbdc ref 1 pages 7
 lock type ufs: EXCL (count 1) by thread 0xc84ce480 (pid 46)#0 0xc06676a1 
at lockmgr+0x4ed
#1 0xc0783fea at ffs_lock+0x76
#2 0xc083688f at VOP_LOCK_APV+0x87
#3 0xc06d50b8 at vn_lock+0xac
#4 0xc06c8f46 at vget+0xc2
#5 0xc0782ab5 at ffs_sync+0x1c1
#6 0xc06caaa0 at sync_fsync+0x164
#7 0xc0835c1f at VOP_FSYNC_APV+0x9b
#8

RELENG_6: I/O deadlock under load

2006-10-27 Thread Ulrich Spoerlein


Hi all,

Our fileserver deadlocked, again. It is running RELENG_6 checked out
yesterday. I have enabled DDB, WITNESS and INVARIANTS and have it
hooked up via serial console.

I can not give out shell access, but I can run any command you might
consider useful, here's more details:

The system has two 3Ware controllers with a big RAID5 volume each:

3ware device driver for 9000 series storage controllers, version: 3.60.02.012
twa0: 3ware 9000 series Storage Controller port 0x3000-0x303f mem
0xdc00-0xddff,0xd830-0xd8300fff irq 48 at device 1.0 on
pci3
twa0: [FAST]
twa0: INFO: (0x15: 0x1300): Controller details:: Model 9550SXU-8LP, 8
ports, Firmware FE9X 3.04.00.005, BIOS BE9X 3.04.00.002
em0: Intel(R) PRO/1000 Network Connection Version - 6.1.4 port
0x3040-0x307f mem 0xd832-0xd833 irq 54 at device 2.0 on pci3
em0: Ethernet address: 00:30:48:30:11:a2
em0: [FAST]
em1: Intel(R) PRO/1000 Network Connection Version - 6.1.4 port
0x3080-0x30bf mem 0xd834-0xd835 irq 55 at device 2.1 on pci3
em1: Ethernet address: 00:30:48:30:11:a3
em1: [FAST]
pci1: base peripheral, interrupt controller at device 0.3 (no driver attached)
pcib4: ACPI PCI-PCI bridge irq 16 at device 4.0 on pci0
pci4: ACPI PCI bus on pcib4
pcib5: ACPI PCI-PCI bridge irq 16 at device 6.0 on pci0
pci5: ACPI PCI bus on pcib5
pcib6: ACPI PCI-PCI bridge at device 0.0 on pci5
pci6: ACPI PCI bus on pcib6
pci5: base peripheral, interrupt controller at device 0.1 (no driver attached)
pcib7: ACPI PCI-PCI bridge at device 0.2 on pci5
pci7: ACPI PCI bus on pcib7
twa1: 3ware 9000 series Storage Controller port 0x4000-0x403f mem
0xde00-0xdfff,0xd850-0xd8500fff irq 96 at device 1.0 on
pci7
twa1: [FAST]
twa1: INFO: (0x15: 0x1300): Controller details:: Model 9550SXU-8LP, 8
ports, Firmware FE9X 3.04.00.005, BIOS BE9X 3.04.00.002
da0 at twa0 bus 0 target 0 lun 0
da0: AMCC 9550SXU-8L DISK 3.04 Fixed Direct Access SCSI-3 device
da0: 100.000MB/s transfers
da0: 1430448MB (2929557504 512 byte sectors: 255H 63S/T 182356C)
da1 at twa1 bus 0 target 0 lun 0
da1: AMCC 9550SXU-8L DISK 3.04 Fixed Direct Access SCSI-3 device
da1: 100.000MB/s transfers
da1: 1430448MB (2929557504 512 byte sectors: 255H 63S/T 182356C)
SMP: AP CPU #1 Launched!
GEOM_MIRROR: Device gm0 created (id=3977032851).
GEOM_MIRROR: Device gm0: provider da0s1 detected.
GEOM_MIRROR: Device gm0: provider da1s1 detected.
GEOM_MIRROR: Device gm0: provider da1s1 activated.
GEOM_MIRROR: Device gm0: provider da0s1 activated.
GEOM_MIRROR: Device gm0: provider mirror/gm0 launched.

The base OS is sitting on a 8GB GMIRROR device across those two
volumes. There were multiple processes running at the time of the
deadlock:
Two dd if=/dev/urandom were writing to the filesystems on each volume.
An rsync was pumping data to a different server. This server also
exposed a part of the volume via GEOM_GATE to the deadlocked host.
This ggate device and a local device formed another gmirror, which was
just rebuilding.

I startet a dump of this gmirrored filesystem, but had to abort
because the tape drive was not recognized. I aborted the dump, ran
camcontrol rescan to get my /dev/sa0 device. mksnap_ffs was still
running, and as I was inpatient, I restarted my dump script. dump(8)
was blocking, because another mksnap_ffs was running. It looks like as
soon as the first mksnap_ffs was finished, the system deadlocked.

Yeah, this is pretty much, but the system has deadlocked before, with
*only* mksnap_ffs running, so I suspect this is the only culprit. I
could still enter the debugger via serial break (pinging the host
still works, switching virtual console work, BUT pressing enter on any
console produces nothing). It also continues to push out syslog
messages to the console ...

db ps
 pid  ppid  pgrp   uid   state   wmesg wchancmd
74669 82674 8267425  N   sendmail
35897 80497 80497 0  N   sendmail
13932 81866  9485 0  SL  vnread   0xdc38a690 grep
81866 64561  9485 0  S   wait 0xc89a5000 sh
54507 32103 32103 0  SL+ pfault   0xc096db18 sleep
64561  9485  9485 0  S   piperd   0xc9cce4c8 perl5.8.8
9485 24955  9485 0  Ss  wait 0xca6ad000 sh
24955  3564  3564 0  S   piperd   0xc9c85b28 cron
24201 10966 75715 0  SL+ physrd   0xdc38f600 dump
72560 10966 75715 0  SL+ physrd   0xdc389c50 dump
31224 10966 75715 0  SL+ physrd   0xdc38ae40 dump
10966  5349 75715 0  S+  sbwait   0xc86a7370 dump
5349 43148 75715 0  S+  wait 0xca690430 dump
43148 75715 75715 0  S+  wait 0xcc284c90 sh
95955 59838 59838 0  S   nfslockd 0xc0967f08 rpc.lockd
59838 1 59838 0  Ss  select   0xc0964224 rpc.lockd
11779 1 11779 0  Ss  select   0xc0964224 rpc.statd
53756 59946 59946 0  S   -0xc84fbc00 nfsd
50902 59946 59946 0  S   -0xcc812200 nfsd
97900 59946 59946 0  S   -0xca9e3000 nfsd

panic: softdep_deallocate_dependancies

2006-10-24 Thread Ulrich Spoerlein


Hi,

Following setup: Two identical fileservers connected directly via
their em1 interfaces. Both running RELENG_6 from early October. fs2
exports a 924GB volume via ggated which is imported by fs1.

fs1 spans a gmirror across its da1s2d and this ggate0 (-fs2) device.
It was just rebuilding the gmirror, when I figured, I'd try to backup
all (rather empty) volumes to tape.

So dump(8) was running on the gmirrored filesystem and was in the
process of snapshotting the device.

It did spit out several of these lines
2006-10-24T15:25:34+0200 kern.crit fs1 kernel: GEOM_MIRROR: Request
failed (error=5). ggate0[WRITE(offset=1017607733248, length=16384)]
2006-10-24T15:25:34+0200 kern.crit fs1 kernel:
g_vfs_done():mirror/share[WRITE(offset=1017607733248,
length=16384)]error = 5
2006-10-24T15:25:34+0200 kern.crit fs1 kernel:
g_vfs_done():mirror/share[WRITE(offset=1017800425472,
length=16384)]error = 5
2006-10-24T15:25:34+0200 kern.crit fs1 kernel:
g_vfs_done():mirror/share[WRITE(offset=1017993117696,
length=16384)]error = 5
2006-10-24T15:25:34+0200 kern.crit fs1 kernel:
g_vfs_done():mirror/share[WRITE(offset=1018185809920,
length=16384)]error = 5

and paniced several seconds later:
panic: softdep_deallocate_dependancies
cpuid = 1

It is an SMP machine, running a rather generic kernel, but with
options QUOTA (quotas are active on a different volume, though). Sadly
no DDB was configured, I'll try to reproduce this.

Is snapshotting/dumping supposed to work on ggate/gmirrored devices?

Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: ppp redial unsuccessful

2006-10-06 Thread Ulrich Spoerlein

Nick Gustas wrote:
 Oct  4 19:03:09 xxx ppp[55]: tun0: Phase: bundle: Authenticate
 Oct  4 19:03:09 xxx ppp[55]: tun0: Phase: deflink: his = PAP, mine = none
 Oct  4 19:03:09 xxx ppp[55]: tun0: Phase: Pap Output: [EMAIL PROTECTED] 
 
 Oct  4 19:03:09 xxx ppp[55]: tun0: LCP: deflink: RecvCodeRej(127) state = 
 Opened
 Oct  4 19:03:11 xxx ppp[55]: tun0: Phase: Pap Input: SUCCESS ()

 The real question is, is there's a way to work around your provider's 
 brokenness without 
 killing the ppp process?

Hi Nick,

I cranked up the debug logging, and compared my ppp login attempts with
your logfile. I get multiple

Oct  6 18:29:43 coyote ppp[67945]: tun0: IPCP: deflink: RecvConfigReq(12) state 
= Initial
Oct  6 18:29:43 coyote ppp[67945]: tun0: IPCP:  IPADDR[6] 213.191.89.20 
Oct  6 18:29:43 coyote ppp[67945]: tun0: IPCP: deflink: Oops, RCR in Initial.
Oct  6 18:29:46 coyote ppp[67945]: tun0: IPCP: deflink: RecvConfigReq(13) state 
= Initial
Oct  6 18:29:46 coyote ppp[67945]: tun0: IPCP:  IPADDR[6] 213.191.89.20
Oct  6 18:29:46 coyote ppp[67945]: tun0: IPCP: deflink: Oops, RCR in Initial.

Using Google Search then led me to the follow posts [1], that describe the
problem in more detail. 'disable ipv6cp' should do the trick, I'll check
this ASAP.

Thanks for your pointer!

[1] 
http://www.freebsd.de/archive/de-bsd-questions/de-bsd-questions.200506/0029.html
http://tech.barwick.de/openbsd/deflink-oops-rcr-in-initial.html

Ulrich Spoerlein
-- 
A: Yes.
Q: Are you sure?
 A: Because it reverses the logical flow of conversation.
 Q: Why is top posting frowned upon?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: ppp redial unsuccessful

2006-10-06 Thread Ulrich Spoerlein

cpghost wrote:
 On Fri, Oct 06, 2006 at 08:02:02PM +0200, Ulrich Spoerlein wrote:
  I cranked up the debug logging, and compared my ppp login attempts with
  your logfile. I get multiple
  
  Oct  6 18:29:43 coyote ppp[67945]: tun0: IPCP: deflink: RecvConfigReq(12) 
  state = Initial
  Oct  6 18:29:43 coyote ppp[67945]: tun0: IPCP:  IPADDR[6] 213.191.89.20 
  Oct  6 18:29:43 coyote ppp[67945]: tun0: IPCP: deflink: Oops, RCR in 
  Initial.
  Oct  6 18:29:46 coyote ppp[67945]: tun0: IPCP: deflink: RecvConfigReq(13) 
  state = Initial
  Oct  6 18:29:46 coyote ppp[67945]: tun0: IPCP:  IPADDR[6] 213.191.89.20
  Oct  6 18:29:46 coyote ppp[67945]: tun0: IPCP: deflink: Oops, RCR in 
  Initial.
  
  Using Google Search then led me to the follow posts [1], that describe the
  problem in more detail. 'disable ipv6cp' should do the trick, I'll check
  this ASAP.
 
 Yesterday, I've had a brand new 6.2-PRERELEASE Oct 4th box installed
 on T-Com ADSL, using the same ppp.conf from my previous post. I've just
 logged into this box and seen a successful disconnect/reconnect, as
 always after 24hrs. Everything seems all right with ppp and T-Com ADSL.

I guess it depends on the actual hardware on the other side. Different
POPs have different hardware (versions) and software (configuration).

Let's wait for another 24h to see if I found the solution.

Ulrich Spoerlein
-- 
A: Yes.
Q: Are you sure?
 A: Because it reverses the logical flow of conversation.
 Q: Why is top posting frowned upon?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Start system with 'downed' carp interfaces

2006-10-05 Thread Ulrich Spoerlein


Hello,

I'm looking for a generic way to create and configure carp interfaces
upon boot (so daemons can bind against the IP address), but keep the
carp interfaces 'down'.

This is to allow the administrator to first check every service after
the failure, and if deemed ready, put the system back into production
by simply issuing: ifconfig carp0 up

But there are several problems:
ifconfig_carp0=foo bar
will always up the interface first via /etc/rc.d/netif
ifconfig carp0 foo bar down
will ignore the 'down' and up the interface. This is especially
announing. I wish ifconfig would honour the down statement, even
though the manpage says the interface will always be brought up when
assigned its first address.

Using a start_if.carp0 with the following contents
ifconfig carp0 vhid 1 1.2.3.4/24
ifconfig carp0 down

and
ifconfig_carp0=down in rc.conf will result in an 'up' interface. I
also disabled devd, as it seems to be running pccard_ether carp0 start
as a result of the interface creation. Although it is started
sometime after the interface has been created.

How are other people handling the startup of carp interfaces?

Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

ppp redial unsuccessful

2006-10-04 Thread Ulrich Spoerlein

  set device PPPoE:dc0
  set dial
  #set redial 40+10-10.90 0
  set redial 90.91 0
  set crtscts off
  set speed sync
  set mru 1492
  set mtu 1492
  set authname XX
  set authkey XX
  add default HISADDR

How are other people circumventing this? I know that I could just
forcefully restart ppp at 3 o'clock in the morning, but I'm more
interested in a permanent fix.

And why is it that ppp *completely* ignores the redial timeout? It
should wait either 90 or 91 seconds, but instead goes on flooding my
/var/log/ppp.log

Any help or hints would be appreciated.

Ulrich Spoerlein
-- 
A: Yes.
Q: Are you sure?
 A: Because it reverses the logical flow of conversation.
 Q: Why is top posting frowned upon?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: ppp redial unsuccessful

2006-10-04 Thread Ulrich Spoerlein

cpghost wrote:
 On Wed, Oct 04, 2006 at 08:51:48PM +0200, Ulrich Spoerlein wrote:
  Hello all,
  
  with my ADSL provider (a reseller of the german Telekom), I'm unable to
  make ppp redial after the link has been lost. With Telekom, you usually
  get disconnected every 24h hours, but you can simply reconnect  if
  our ppp would support it.
 
 Have you added this to /etc/rc.conf?
 
 ppp_mode=ddial

Yes of course, as you can see, ppp(8) is not exiting, but entering an
redial endless loop ...

Ulrich Spoerlein
-- 
A: Yes.
Q: Are you sure?
 A: Because it reverses the logical flow of conversation.
 Q: Why is top posting frowned upon?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: ppp redial unsuccessful

2006-10-04 Thread Ulrich Spoerlein

cpghost wrote:
 On Wed, Oct 04, 2006 at 03:37:37PM -0400, Nick Gustas wrote:
  Not that it helps you much, but I do see working pppoe redial behavior 
  with Yahoo/ATT dsl at a client site in the US. I can unhook the dsl 
  line and it will autoreconnect as soon as it's plugged in again. In the 
  event of a provider outage it comes back up on its own. The current ppp 
  session has been running for 59 days, longest session was 353 days, but 
  the server had to be moved for remodeling.
 
 Same here. I've got some 6.1-STABLE boxes running since 70 days
 uninterrupted on german T-Com ADSL (PPPoE). ppp redials automatically
 without any problems there.

I maintain three FreeBSD boxes from 4.11 to 6.1-RELEASE and 6-STABLE.
They have been showing this for at least 1 or 2 years. So it is/was also
present in the 5.x line.

I usually work around this by having a cron job that restarts ppp every
day at 04:00 or somewhere around that.

So either I'm just unlucky or I'm doing something fundamentally wrong.

Could someone paste me the snippet from ppp.log of a successful 24h
disconnect + redial?

Thanks.

Ulrich Spoerlein
-- 
A: Yes.
Q: Are you sure?
 A: Because it reverses the logical flow of conversation.
 Q: Why is top posting frowned upon?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

altq on tun0: queueing works, prioritization not?

2006-10-01 Thread Ulrich Spoerlein

Hello all,

I tried to set up TCP ACK prioritization with pf/altq as has been
described on various places of the internet.

It doesn't work as expected. I have a 16Mb/1Mb DSL link, the modem is
connected to a dc(4) device, I'm working with the tun0 device for my
firewall rules. Here they are:

ext_if=tun0
scrub in all
altq on tun0 priq bandwidth 400Kb queue { std, http, ssh, dns, tcp_ack }

queue std priority 1 priq(default)
queue tcp_ack priority 6

pass out on $ext_if proto tcp from any to any queue(std, tcp_ack)


Please note that I tried various bandwidth settings, for testing
purposes I set it to a very very low 400kb.

When downloading from ftp.de.freebsd.org, I'm able to achieve roughly
950kB/s. If I then start an FTP upload (which will reach some 42kB/s, so
the 400kb bandwidth is in effect), the interface throughput drops down
to a mere 120kB/s.

The 400kb limit should also be low enough, as I'm able to upload to that
same ftp with up to 100kB/s if I turn off queueing.

This is definitely not what I would expect. Where is my error?

Ulrich Spoerlein
-- 
A: Yes.
Q: Are you sure?
 A: Because it reverses the logical flow of conversation.
 Q: Why is top posting frowned upon?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: altq on tun0: queueing works, prioritization not?

2006-10-01 Thread Ulrich Spoerlein

Ulrich Spoerlein wrote:
 This is definitely not what I would expect. Where is my error?

Oh well, I should have tried 'cbq' earlier. With the following settings
(renamed the queues)

altq on $ext_if cbq bandwidth 800Kb queue { q_pri, q_std }
queue q_pri priority 6 cbq(borrow)
queue q_std priority 1 cbq(default borrow)

I'm actually able to achieve some effect. The upload is capped at
70-80kB/s and the download will fluctuate between 580 and 750 kB/s.
Much better than the plain priority queuing.

As soon as I cut the upload, the download will jump back to
950-1000kB/s.

Is this discrepancy (pri vs. cbq) known?

Ulrich Spoerlein
-- 
A: Yes.
Q: Are you sure?
 A: Because it reverses the logical flow of conversation.
 Q: Why is top posting frowned upon?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Fwd: Loadable SMBus modules regression in 6-STABLE - 6-BETA

2006-09-27 Thread Ulrich Spoerlein


On 9/27/06, Dmitry Pryanishnikov [EMAIL PROTECTED] wrote:


On Tue, 26 Sep 2006, John Baldwin wrote:
 I've just found it and fixed it if you upgrade to the newest smbus.c.
Thanks, the problem has indeed been fixed.



I'm sorry to hijack this thread, but what's the recommended way to read out
temperature values via SMB?

[EMAIL PROTECTED]:31:3:  class=0x0c0500 card=0x618015d9 chip=0x24d38086
rev=0x02 hdr=0x00
   vendor   = 'Intel Corporation'
   device   = '82801EB/ER (ICH5/ICH5R) SMBus Controller'
   class= serial bus
   subclass = SMBus

ichsmb0: Intel 82801EB (ICH5) SMBus controller port 0x1100-0x111f irq 17
at device 31.3 on pci0
ichsmb0: [GIANT-LOCKED]
smbus0: System Management Bus on ichsmb0

# mbmon -d
ioctl(smb0:open): No such file or directory
SMBus[Intel8XX(ICH/ICH2/ICH3/ICH4/ICH5/ICH6)] found, but No HWM available on
it!!
No Hardware Monitor found!!
InitMBInfo: Bad file descriptor
# ls /dev/smb*
ls: No match.
# sysctl -a|grep smb
dev.ichsmb.0.%desc: Intel 82801EB (ICH5) SMBus controller
dev.ichsmb.0.%driver: ichsmb
dev.ichsmb.0.%location: slot=31 function=3 handle=\_SB_.PCI0.SMBS
dev.ichsmb.0.%pnpinfo: vendor=0x8086 device=0x24d3 subvendor=0x15d9
subdevice=0x6180 class=0x0c0500
dev.ichsmb.0.%parent: pci0
dev.smbus.0.%desc: System Management Bus
dev.smbus.0.%driver: smbus
dev.smbus.0.%parent: ichsmb0

Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Loadable SMBus modules regression in 6-STABLE - 6-BETA

2006-09-27 Thread Ulrich Spoerlein


On 9/27/06, Ulrich Spoerlein [EMAIL PROTECTED] wrote:


I'm sorry to hijack this thread, but what's the recommended way to read
out temperature values via SMB?

[EMAIL PROTECTED]:31:3:  class=0x0c0500 card=0x618015d9 chip=0x24d38086
rev=0x02 hdr=0x00
vendor   = 'Intel Corporation'
device   = '82801EB/ER (ICH5/ICH5R) SMBus Controller'
class= serial bus
subclass = SMBus

ichsmb0: Intel 82801EB (ICH5) SMBus controller port 0x1100-0x111f irq 17
at device 31.3 on pci0
ichsmb0: [GIANT-LOCKED]
smbus0: System Management Bus on ichsmb0

# mbmon -d
ioctl(smb0:open): No such file or directory
SMBus[Intel8XX(ICH/ICH2/ICH3/ICH4/ICH5/ICH6)] found, but No HWM available
on it!!
No Hardware Monitor found!!
InitMBInfo: Bad file descriptor
# ls /dev/smb*
ls: No match.
# sysctl -a|grep smb
dev.ichsmb.0.%desc: Intel 82801EB (ICH5) SMBus controller
dev.ichsmb.0.%driver: ichsmb
dev.ichsmb.0.%location: slot=31 function=3 handle=\_SB_.PCI0.SMBS
dev.ichsmb.0.%pnpinfo: vendor=0x8086 device=0x24d3 subvendor=0x15d9
subdevice=0x6180 class=0x0c0500
dev.ichsmb.0.%parent: pci0
dev.smbus.0.%desc: System Management Bus
dev.smbus.0.%driver: smbus
dev.smbus.0.%parent: ichsmb0



Ok, forget about the 'ls /dev/smb', the /dev/smb0 device is actually there,
it's just devfs that set me up again. I also found sysutils/lmmon to give me
some sane values, however the temperature values are way off:

MB temp:
254C / 489F / 527K
Fans:
 1 :0 rpm
 2 :0 rpm
 3 :0 rpm
Voltages:
 Vcore1 :  +2.703V
 Vcore2 :  +2.750V
 + 3.3V :  +2.750V
 + 5.0V :  +4.906V
 +12.0V : +11.812V
 -12.0V : -11.938V
 - 5.0V :  -5.114V
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: 6.2 SHOWSTOPPER - em completely unusable on 6.2

2006-09-27 Thread Ulrich Spoerlein


On 9/27/06, Martin Nilsson [EMAIL PROTECTED] wrote:


mailbox# uname -a
FreeBSD mailbox 6.2-PRERELEASE FreeBSD 6.2-PRERELEASE #0: Fri Sep 22
00:31:29 CEST 2006
[EMAIL PROTECTED]:/usr/obj-local/usr/src/sys/SMP  amd64

I get tons of these:
em0: watchdog timeout -- resetting
em0: link state changed to DOWN
em0: link state changed to UP

mailbox# pciconf -lv
[EMAIL PROTECTED]:0:0:  class=0x02 card=0x108c15d9 chip=0x108c8086 rev=0x03
hdr=0x00
 vendor   = 'Intel Corporation'
 device   = 'PRO/1000 PM'
 class= network
 subclass = ethernet
[EMAIL PROTECTED]:0:0:  class=0x02 card=0x109a15d9 chip=0x109a8086 rev=0x00
hdr=0x00
 vendor   = 'Intel Corporation'
 class= network
 subclass = ethernet

em0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
 options=bRXCSUM,TXCSUM,VLAN_MTU
 inet6 fe80::230:48ff:fe89:c958%em0 prefixlen 64 scopeid 0x1
 inet 192.168.10.2 netmask 0xff00 broadcast 192.168.10.255
 ether 00:30:48:89:c9:58
 media: Ethernet autoselect (1000baseTX full-duplex)
 status: active



We have several SMP systems with onboard em0/em1 Interfaces running on a
RELENG_6 snapshot taken at 2006-09-20 00:00+0. They are not in production
yet, so the load is not that much. However I haven't seen any watchdog
timeouts on them. Only annoyance is, that the em(4) interfaces take too long
for the interface to come up, ie, the system will boot, run ifconfig, the
interface still has no link so syslogd/ntpdate/ntpd will complain about 'no
route to host'. A 'sleep 5' fixes that problem, though I'd like to avoid
such hacks.

Anyway, here's the data:

[EMAIL PROTECTED]:2:0:   class=0x02 card=0x117a8086 chip=0x10798086 rev=0x03
hdr=0x00
   vendor   = 'Intel Corporation'
   device   = '82546EB Dual Port Gigabit Ethernet Controller'
   class= network
   subclass = ethernet
[EMAIL PROTECTED]:2:1:   class=0x02 card=0x117a8086 chip=0x10798086 rev=0x03
hdr=0x00
   vendor   = 'Intel Corporation'
   device   = '82546EB Dual Port Gigabit Ethernet Controller'
   class= network
   subclass = ethernet

em0: Intel(R) PRO/1000 Network Connection Version - 6.1.4 port
0x3040-0x307f mem 0xd832-0xd833 irq 54 at device 2.0 on pci3
em0: Ethernet address: XX
em0: [FAST]
em1: Intel(R) PRO/1000 Network Connection Version - 6.1.4 port
0x3080-0x30bf mem 0xd834-0xd835 irq 55 at device 2.1 on pci3
em1: Ethernet address: XX
em1: [FAST]
em0: link state changed to UP

em0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
   options=bRXCSUM,TXCSUM,VLAN_MTU
   inet 1.2.3.4 netmask 0xff00 broadcast 1.2.3.4
   ether X
   media: Ethernet autoselect (100baseTX full-duplex)
   status: active

Hope this helps to narrow down the problem.
Uli
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

make release vs. installworld

2006-09-26 Thread Ulrich Spoerlein


Hi all,

I am building my own releases for FreeBSD. When upgrading a server to the
new release, I'd like to use the 'make installworld' procedure. Therefore
I'm mounting the /usr/src and /usr/obj from the release build via NFS onto
the server in question.

However, installworld will fail, as it looks like some binaries are not
built inside the chrooted make release build. First missing binary is
cat(1). After manually building it, the installworld stops at chmod(1)

=== bin/chio (install)
install -s -o root -g wheel -m 555   chio /bin
install -o root -g wheel -m 444 chio.1.gz  /usr/share/man/man1
=== bin/chmod (install)
install -s -o root -g wheel -m 555   chmod /bin
install: chmod: No such file or directory
*** Error code 71

So, what's the recommended way to a) build own releases and b) update your
servers with it.

Uli

PS: And why is the FreeBSD release build process so complex?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: wine: ld-elf.so.1 not found

2006-07-29 Thread Ulrich Spoerlein

Andresen, Jason wrote:
 I'm having a very strange problem with Wine.  It apparently refuses to
 see ld when starting:
 
 escaflowne/p7 (72 ~): wine
 ELF interpreter /libexec/ld-elf.so.1 not found
 [...]
 
 I'm really stumped as to what the problem is.

Search the archives, I had that problem too. I traced it back to
kern.maxdsiz  1GB. Please check your local data size limit.

Ulrich Spoerlein

PS: This is not a bug in Wine itself, but in our ELF handling. Running
ldd(1) on the wine binary will result in an ELF interpreter error too.
-- 
A: Yes.
Q: Are you sure?
 A: Because it reverses the logical flow of conversation.
 Q: Why is top posting frowned upon?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: ural(4) deassociates if no activity (possible wpa_supplicant problem)

2006-07-18 Thread Ulrich Spoerlein

Niki Denev wrote:
 Well, after a few more moments investigating the problem it seems that 
 dhclient is to blame.
 If i don't start it i don't get disconnected, and also i noticed that the 
 five minute interval
 matches the dhclient renewal period of 300 seconds.
 
 So the logical question is, why dhclient makes my ural(4) adapter 
 deassociate, and what i can
 do to prevent this :)

Hmm, interesting. I'm using ural(4) as an AP and connect to it via
ipw(4) and simple WEP. It is very unstable and will wedge the AP
(running 6.1) after several minutes.

I can't give you more details, as it is a rather complex setup and I
would have to isolate the problem first (is it WEP, is it bridge(4),
etc.)

Ulrich Spoerlein
-- 
 PGP Key ID: 20FEE9DD   Encrypted mail welcome!
Fingerprint: AEC9 AF5E 01AC 4EE1 8F70  6CBD E76E 2227 20FE E9DD
Which is worse: ignorance or apathy?
Don't know. Don't care.


pgpQN4AR98Ozl.pgp
Description: PGP signature

Re: unmounting a filesystem safely that doesn't exist anymore

2006-06-12 Thread Ulrich Spoerlein

Björn König wrote:
 Hello,
 
 I did a mistake: I unplugged my digital camera accidentally before I 
 unmounted the 
 filesystem. *doh* This happens very often, because I'm very scatterbrained. 
 =) The kernel 
 will panic and all filesystems remain unclean in any case now. I know that 
 this is a well 
 know issue and in past discussions you stated that this behaviour is intended 
 and won't be 
 changed ad hoc. I just want to know if somebody knows a workaround or small 
 trick that 
 prevents the other filesystems from being unclean on next boot-up.

You might give the automounter (am-utils) a whirl. They are very
confusing to set up, but you can set the unmount-if-unused timeout to
something like 5 seconds. This could narrow the window enough to not
panic you system frequently :)

Ulrich Spoerlein
-- 
 PGP Key ID: 20FEE9DD   Encrypted mail welcome!
Fingerprint: AEC9 AF5E 01AC 4EE1 8F70  6CBD E76E 2227 20FE E9DD
Which is worse: ignorance or apathy?
Don't know. Don't care.


pgptDKFK0qUnN.pgp
Description: PGP signature

Re: How can I know which files a proccess is accessing?

2006-06-12 Thread Ulrich Spoerlein

Dan Nelson wrote:
 In the last episode (Jun 09), Ulrich Spoerlein said:
  Sadly, ktrace(1) seems to be rather useless in RELENG_6 right now.
  Every medium sized app will result in an out of ktrace objects
  error. I remember that some improvements to ktrace(1) went into
  -CURRENT. Time for an MFC?
 
 Just raise the kern.ktrace.request_pool sysctl; 4096 works for me.

Heh, I didn't know that sysctl existed. Why is the default value (100)
so low? I set it to 4096, but it only survives three seconds when
running 'ktrace find ~'

Anyway, next time I need ktrace, I'll remember to bump the pool size.
Thanks!

Ulrich Spoerlein
-- 
 PGP Key ID: 20FEE9DD   Encrypted mail welcome!
Fingerprint: AEC9 AF5E 01AC 4EE1 8F70  6CBD E76E 2227 20FE E9DD
Which is worse: ignorance or apathy?
Don't know. Don't care.


pgpktFWAqYki3.pgp
Description: PGP signature

Re: How can I know which files a proccess is accessing?

2006-06-10 Thread Ulrich Spoerlein

Robert Watson wrote:
 A lot of people have answered and told you about lsof, which is a great tool, 
 and can give 
 you a momentary snapshot of the files a process has open. You might also be 
 interested in 
 getting a log of accesses, which you can do using ktrace(1).  This tracks 
 system calls and 
 you can see what paths are being accessed at time of open.  As of 7.x (and 
 hopefully 6.2 once 
 the MFC happens) you'll also be able to use audit(4) to track access of files 
 by processes.

Sadly, ktrace(1) seems to be rather useless in RELENG_6 right now. Every
medium sized app will result in an out of ktrace objects error. I
remember that some improvements to ktrace(1) went into -CURRENT. Time
for an MFC?

Ulrich Spoerlein
-- 
 PGP Key ID: 20FEE9DD   Encrypted mail welcome!
Fingerprint: AEC9 AF5E 01AC 4EE1 8F70  6CBD E76E 2227 20FE E9DD
Which is worse: ignorance or apathy?
Don't know. Don't care.


pgpqxY87H4unN.pgp
Description: PGP signature

1 2 >

1 - 100 of 133 matches

Mail list logo