Re: Puzzling performance
On Monday, August 02, 2010 6:13:50 pm Guy Helmer wrote: On a FreeBSD 7.1 SCHED_ULE kernel, I have a large number of files opened and mmapped (with MAP_NOSYNC option) for shared-memory communication between processes. Normally, memcpy() copies data into these shared-memory buffers in a reasonable amount of time closely related to the size of the copy (roughly 10us per 10KB). However, due to performance issues I've found that sometimes a memcpy() takes an abnormally long time (10ms for 40KB, and I suspect longer times occurring when I have not had monitoring enabled). The system doesn't seem to be in memory overcommit -- there is just a minor amount of swap in use, and I've not seen page-ins or page-outs while watching systat or vmstat. Since I'm using MAP_NOSYNC, I would not expect the pager to flush dirty pages to disk and cause add delays. Any ideas where to look? Might it help to pin threads to CPUs in case a thread is getting moved to a different core? Pinning might help yes. You might also want to ensure there aren't any interrupts on that CPU. Currently there isn't a good way to figure that out short of kgdb though. :( -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: PCI config space is not restored upon resume (macbook pro)
On Tuesday, August 03, 2010 6:49:07 am Oleg Sharoyko wrote: Hi! I'm trying to make FreeBSD (9-Current, checkout on 2010-08-01) correctly suspend/resume on macbook pro. As of now I have to issues with resume: 1. Display stays blank upon resume. Got 'vga0: failed to reload state' in dmesg, but I haven't looked into this yet. 2. Some hardware is missing upon resume, specifically ath, msk and firewire. This devices disappear because rather strange values are being read from pci config space (such as vendor id, device id and others). I wonder if the bus numbers for PCI-PCI bridges need to be restored on resume? If they aren't then config transactions won't be routed properly. You could add a pcib_resume() method that prints out the various bus register values after resume to see if they match what we print out during boot. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: sched_pin() versus PCPU_GET
On Tuesday, August 03, 2010 9:46:16 pm m...@freebsd.org wrote: On Fri, Jul 30, 2010 at 2:31 PM, John Baldwin j...@freebsd.org wrote: On Friday, July 30, 2010 10:08:22 am John Baldwin wrote: On Thursday, July 29, 2010 7:39:02 pm m...@freebsd.org wrote: We've seen a few instances at work where witness_warn() in ast() indicates the sched lock is still held, but the place it claims it was held by is in fact sometimes not possible to keep the lock, like: thread_lock(td); td-td_flags = ~TDF_SELECT; thread_unlock(td); What I was wondering is, even though the assembly I see in objdump -S for witness_warn has the increment of td_pinned before the PCPU_GET: 802db210: 65 48 8b 1c 25 00 00mov%gs:0x0,%rbx 802db217: 00 00 802db219: ff 83 04 01 00 00 incl 0x104(%rbx) * Pin the thread in order to avoid problems with thread migration. * Once that all verifies are passed about spinlocks ownership, * the thread is in a safe path and it can be unpinned. */ sched_pin(); lock_list = PCPU_GET(spinlocks); 802db21f: 65 48 8b 04 25 48 00mov%gs:0x48,%rax 802db226: 00 00 if (lock_list != NULL lock_list-ll_count != 0) { 802db228: 48 85 c0test %rax,%rax * Pin the thread in order to avoid problems with thread migration. * Once that all verifies are passed about spinlocks ownership, * the thread is in a safe path and it can be unpinned. */ sched_pin(); lock_list = PCPU_GET(spinlocks); 802db22b: 48 89 85 f0 fe ff ffmov%rax,-0x110(%rbp) 802db232: 48 89 85 f8 fe ff ffmov%rax,-0x108(%rbp) if (lock_list != NULL lock_list-ll_count != 0) { 802db239: 0f 84 ff 00 00 00 je 802db33e witness_warn+0x30e 802db23f: 44 8b 60 50 mov0x50(%rax),%r12d is it possible for the hardware to do any re-ordering here? The reason I'm suspicious is not just that the code doesn't have a lock leak at the indicated point, but in one instance I can see in the dump that the lock_list local from witness_warn is from the pcpu structure for CPU 0 (and I was warned about sched lock 0), but the thread id in panic_cpu is 2. So clearly the thread was being migrated right around panic time. This is the amd64 kernel on stable/7. I'm not sure exactly what kind of hardware; it's a 4-way Intel chip from about 3 or 4 years ago IIRC. So... do we need some kind of barrier in the code for sched_pin() for it to really do what it claims? Could the hardware have re-ordered the mov%gs:0x48,%rax PCPU_GET to before the sched_pin() increment? Hmmm, I think it might be able to because they refer to different locations. Note this rule in section 8.2.2 of Volume 3A: • Reads may be reordered with older writes to different locations but not with older writes to the same location. It is certainly true that sparc64 could reorder with RMO. I believe ia64 could reorder as well. Since sched_pin/unpin are frequently used to provide this sort of synchronization, we could use memory barriers in pin/unpin like so: sched_pin() { td-td_pinned = atomic_load_acq_int(td-td_pinned) + 1; } sched_unpin() { atomic_store_rel_int(td-td_pinned, td-td_pinned - 1); } We could also just use atomic_add_acq_int() and atomic_sub_rel_int(), but they are slightly more heavyweight, though it would be more clear what is happening I think. However, to actually get a race you'd have to have an interrupt fire and migrate you so that the speculative read was from the other CPU. However, I don't think the speculative read would be preserved in that case. The CPU has to return to a specific PC when it returns from the interrupt and it has no way of storing the state for what speculative reordering it might be doing, so presumably it is thrown away? I suppose it is possible that it actually retires both instructions (but reordered) and then returns to the PC value after the read of listlocks after the interrupt. However, in that case the scheduler would not migrate as it would see td_pinned != 0. To get the race you have to have the interrupt take effect prior to modifying td_pinned, so I think the processor would have to discard the reordered read of listlocks so it could safely resume execution at the 'incl' instruction. The other nit there on x86 at least is that the incl instruction is doing both a read and a write and another rule in the section 8.2.2 is this: • Reads are not reordered with other reads. That would seem to prevent the read of listlocks from passing the read of td_pinned in the incl instruction on x86. I wonder how that's interpreted
Re: PCI config space is not restored upon resume (macbook pro)
acpi_pcib_acpi_attach(device_t bus); -static int acpi_pcib_acpi_resume(device_t bus); static int acpi_pcib_read_ivar(device_t dev, device_t child, int which, uintptr_t *result); static int acpi_pcib_write_ivar(device_t dev, device_t child, @@ -94,7 +93,7 @@ DEVMETHOD(device_attach, acpi_pcib_acpi_attach), DEVMETHOD(device_shutdown, bus_generic_shutdown), DEVMETHOD(device_suspend, bus_generic_suspend), -DEVMETHOD(device_resume, acpi_pcib_acpi_resume), +DEVMETHOD(device_resume, bus_generic_resume), /* Bus interface */ DEVMETHOD(bus_print_child, bus_generic_print_child), @@ -257,13 +257,6 @@ return (acpi_pcib_attach(dev, sc-ap_prt, sc-ap_bus)); } -static int -acpi_pcib_acpi_resume(device_t dev) -{ - -return (acpi_pcib_resume(dev)); -} - /* * Support for standard PCI bridge ivars. */ Index: dev/acpica/acpi_pcibvar.h === --- dev/acpica/acpi_pcibvar.h (revision 210796) +++ dev/acpica/acpi_pcibvar.h (working copy) @@ -31,13 +31,14 @@ #define_ACPI_PCIBVAR_H_ #ifdef _KERNEL + void acpi_pci_link_add_reference(device_t dev, int index, device_t pcib, int slot, int pin); intacpi_pci_link_route_interrupt(device_t dev, int index); intacpi_pcib_attach(device_t bus, ACPI_BUFFER *prt, int busno); intacpi_pcib_route_interrupt(device_t pcib, device_t dev, int pin, ACPI_BUFFER *prtbuf); -intacpi_pcib_resume(device_t dev); + #endif /* _KERNEL */ #endif /* !_ACPI_PCIBVAR_H_ */ Index: dev/pci/pcib_private.h === --- dev/pci/pcib_private.h (revision 210796) +++ dev/pci/pcib_private.h (working copy) @@ -37,6 +37,7 @@ * Export portions of generic PCI:PCI bridge support so that it can be * used by subclasses. */ +DECLARE_CLASS(pcib_driver); /* * Bridge-specific data. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: sched_pin() versus PCPU_GET
On Wednesday, August 04, 2010 12:20:31 pm m...@freebsd.org wrote: On Wed, Aug 4, 2010 at 2:26 PM, John Baldwin j...@freebsd.org wrote: On Tuesday, August 03, 2010 9:46:16 pm m...@freebsd.org wrote: On Fri, Jul 30, 2010 at 2:31 PM, John Baldwin j...@freebsd.org wrote: On Friday, July 30, 2010 10:08:22 am John Baldwin wrote: On Thursday, July 29, 2010 7:39:02 pm m...@freebsd.org wrote: We've seen a few instances at work where witness_warn() in ast() indicates the sched lock is still held, but the place it claims it was held by is in fact sometimes not possible to keep the lock, like: thread_lock(td); td-td_flags = ~TDF_SELECT; thread_unlock(td); What I was wondering is, even though the assembly I see in objdump -S for witness_warn has the increment of td_pinned before the PCPU_GET: 802db210: 65 48 8b 1c 25 00 00mov%gs:0x0,%rbx 802db217: 00 00 802db219: ff 83 04 01 00 00 incl 0x104(%rbx) * Pin the thread in order to avoid problems with thread migration. * Once that all verifies are passed about spinlocks ownership, * the thread is in a safe path and it can be unpinned. */ sched_pin(); lock_list = PCPU_GET(spinlocks); 802db21f: 65 48 8b 04 25 48 00mov%gs:0x48,%rax 802db226: 00 00 if (lock_list != NULL lock_list-ll_count != 0) { 802db228: 48 85 c0test %rax,%rax * Pin the thread in order to avoid problems with thread migration. * Once that all verifies are passed about spinlocks ownership, * the thread is in a safe path and it can be unpinned. */ sched_pin(); lock_list = PCPU_GET(spinlocks); 802db22b: 48 89 85 f0 fe ff ffmov%rax,-0x110(%rbp) 802db232: 48 89 85 f8 fe ff ffmov%rax,-0x108(%rbp) if (lock_list != NULL lock_list-ll_count != 0) { 802db239: 0f 84 ff 00 00 00 je 802db33e witness_warn+0x30e 802db23f: 44 8b 60 50 mov0x50(%rax),%r12d is it possible for the hardware to do any re-ordering here? The reason I'm suspicious is not just that the code doesn't have a lock leak at the indicated point, but in one instance I can see in the dump that the lock_list local from witness_warn is from the pcpu structure for CPU 0 (and I was warned about sched lock 0), but the thread id in panic_cpu is 2. So clearly the thread was being migrated right around panic time. This is the amd64 kernel on stable/7. I'm not sure exactly what kind of hardware; it's a 4-way Intel chip from about 3 or 4 years ago IIRC. So... do we need some kind of barrier in the code for sched_pin() for it to really do what it claims? Could the hardware have re-ordered the mov%gs:0x48,%rax PCPU_GET to before the sched_pin() increment? Hmmm, I think it might be able to because they refer to different locations. Note this rule in section 8.2.2 of Volume 3A: • Reads may be reordered with older writes to different locations but not with older writes to the same location. It is certainly true that sparc64 could reorder with RMO. I believe ia64 could reorder as well. Since sched_pin/unpin are frequently used to provide this sort of synchronization, we could use memory barriers in pin/unpin like so: sched_pin() { td-td_pinned = atomic_load_acq_int(td-td_pinned) + 1; } sched_unpin() { atomic_store_rel_int(td-td_pinned, td-td_pinned - 1); } We could also just use atomic_add_acq_int() and atomic_sub_rel_int(), but they are slightly more heavyweight, though it would be more clear what is happening I think. However, to actually get a race you'd have to have an interrupt fire and migrate you so that the speculative read was from the other CPU. However, I don't think the speculative read would be preserved in that case. The CPU has to return to a specific PC when it returns from the interrupt and it has no way of storing the state for what speculative reordering it might be doing, so presumably it is thrown away? I suppose it is possible that it actually retires both instructions (but reordered) and then returns to the PC value after the read of listlocks after the interrupt. However, in that case the scheduler would not migrate as it would see td_pinned != 0. To get the race you have to have the interrupt take effect prior to modifying td_pinned, so I think the processor would have to discard the reordered read of listlocks so it could safely resume execution at the 'incl' instruction. The other nit there on x86 at least is that the incl instruction is doing both
Re: Not getting interrupts from PCI express slot
On Wednesday, August 04, 2010 1:18:53 pm Hans Petter Selasky wrote: Hi, I'm not getting any interrupts from a PCI express slot. When I insert a device, no attach event is generated. If the device is present during boot the device is fully detected, but still no IRQ's. Is there anything I can do or test? I'm running 8-stable on amd64. In general FreeBSD doesn't support hotplug PCI currently. Likely you'd need some sort of hotplug bridge driver similar to cbb(4) for Cardbus slots that would catch whatever interrupt is generated when a card is inserted and add the device, etc. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: PCI config space is not restored upon resume (macbook pro)
On Thursday, August 05, 2010 11:30:23 am Oleg Sharoyko wrote: On 4 August 2010 19:12, John Baldwin j...@freebsd.org wrote: Cool, I actually think that the ACPI PCI-PCI driver can just use the stock PCI-PCI bridge driver's suspend and resume methods. Can you try out this alternate patch instead? It works, and sure looks better than mine. I didn't know there's such a nice way to inherit methods. This sounds like the display just needs to be powered on via DPMS. You might be able to make this work via acpi_video and toggling the LCD status that way. You could also try dpms.ko. I'm afraid things are not that simple. I have tried without success acpi_video.ko, dmps.ko, sysctl hw.acpi.reset_video and sysutils/vbetool. And what worries me, X server cannon start on resumed system. From Xorg.log: (EE) NV(0): Failed to determine the amount of available video memory It looks like videcard just ignores any requests. Are you using the nvidia-driver or the nv driver from X? -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: sched_pin() versus PCPU_GET
On Thursday, August 05, 2010 12:01:22 pm m...@freebsd.org wrote: On Wed, Aug 4, 2010 at 9:20 AM, m...@freebsd.org wrote: On Wed, Aug 4, 2010 at 2:26 PM, John Baldwin j...@freebsd.org wrote: Actually, I would beg to differ in that case. If PCPU_GET(spinlocks) returns non-NULL, then it means that you hold a spin lock, ll_count is 0 for the correct pc_spinlocks and non-zero for the wrong one, though. So I think it can be non-NULL but the current thread/CPU doesn't hold a spinlock. I don't believe we have any code in the NMI handler. I'm on vacation today so I'll check tomorrow. I checked and ipi_nmi_handler() doesn't appear to have any local changes. I assume that's where I should look? The tricky bits are all in the assembly rather than in C, probably in exception.S. However, if %gs were corrupt I would not expect it to point to another CPU's data, but garbage from userland. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: sched_pin() versus PCPU_GET
On Thursday, August 05, 2010 11:59:37 am m...@freebsd.org wrote: On Wed, Aug 4, 2010 at 11:55 AM, John Baldwin j...@freebsd.org wrote: On Wednesday, August 04, 2010 12:20:31 pm m...@freebsd.org wrote: On Wed, Aug 4, 2010 at 2:26 PM, John Baldwin j...@freebsd.org wrote: On Tuesday, August 03, 2010 9:46:16 pm m...@freebsd.org wrote: On Fri, Jul 30, 2010 at 2:31 PM, John Baldwin j...@freebsd.org wrote: On Friday, July 30, 2010 10:08:22 am John Baldwin wrote: On Thursday, July 29, 2010 7:39:02 pm m...@freebsd.org wrote: We've seen a few instances at work where witness_warn() in ast() indicates the sched lock is still held, but the place it claims it was held by is in fact sometimes not possible to keep the lock, like: thread_lock(td); td-td_flags = ~TDF_SELECT; thread_unlock(td); What I was wondering is, even though the assembly I see in objdump -S for witness_warn has the increment of td_pinned before the PCPU_GET: 802db210: 65 48 8b 1c 25 00 00mov%gs:0x0,%rbx 802db217: 00 00 802db219: ff 83 04 01 00 00 incl 0x104(%rbx) * Pin the thread in order to avoid problems with thread migration. * Once that all verifies are passed about spinlocks ownership, * the thread is in a safe path and it can be unpinned. */ sched_pin(); lock_list = PCPU_GET(spinlocks); 802db21f: 65 48 8b 04 25 48 00mov%gs:0x48,%rax 802db226: 00 00 if (lock_list != NULL lock_list-ll_count != 0) { 802db228: 48 85 c0test %rax,%rax * Pin the thread in order to avoid problems with thread migration. * Once that all verifies are passed about spinlocks ownership, * the thread is in a safe path and it can be unpinned. */ sched_pin(); lock_list = PCPU_GET(spinlocks); 802db22b: 48 89 85 f0 fe ff ffmov %rax,-0x110(%rbp) 802db232: 48 89 85 f8 fe ff ffmov %rax,-0x108(%rbp) if (lock_list != NULL lock_list-ll_count != 0) { 802db239: 0f 84 ff 00 00 00 je 802db33e witness_warn+0x30e 802db23f: 44 8b 60 50 mov0x50(%rax), %r12d is it possible for the hardware to do any re-ordering here? The reason I'm suspicious is not just that the code doesn't have a lock leak at the indicated point, but in one instance I can see in the dump that the lock_list local from witness_warn is from the pcpu structure for CPU 0 (and I was warned about sched lock 0), but the thread id in panic_cpu is 2. So clearly the thread was being migrated right around panic time. This is the amd64 kernel on stable/7. I'm not sure exactly what kind of hardware; it's a 4-way Intel chip from about 3 or 4 years ago IIRC. So... do we need some kind of barrier in the code for sched_pin() for it to really do what it claims? Could the hardware have re- ordered the mov%gs:0x48,%rax PCPU_GET to before the sched_pin() increment? Hmmm, I think it might be able to because they refer to different locations. Note this rule in section 8.2.2 of Volume 3A: • Reads may be reordered with older writes to different locations but not with older writes to the same location. It is certainly true that sparc64 could reorder with RMO. I believe ia64 could reorder as well. Since sched_pin/unpin are frequently used to provide this sort of synchronization, we could use memory barriers in pin/unpin like so: sched_pin() { td-td_pinned = atomic_load_acq_int(td-td_pinned) + 1; } sched_unpin() { atomic_store_rel_int(td-td_pinned, td-td_pinned - 1); } We could also just use atomic_add_acq_int() and atomic_sub_rel_int(), but they are slightly more heavyweight, though it would be more clear what is happening I think. However, to actually get a race you'd have to have an interrupt fire and migrate you so that the speculative read was from the other CPU. However, I don't think the speculative read would be preserved in that case. The CPU has to return to a specific PC when it returns from the interrupt and it has no way of storing the state for what speculative reordering it might be doing, so presumably it is thrown away? I suppose it is possible that it actually retires both instructions (but reordered) and then returns to the PC value after the read of listlocks after the interrupt. However, in that case the scheduler would not migrate as it would see td_pinned != 0. To get the race you have to have the interrupt take effect prior to modifying td_pinned, so I think the processor
Re: 8.1-STABLE amd64 machine check
Dan Langille wrote: I am encountering a situation similar to one reported by Andrew Heybey at http://docs.freebsd.org/cgi/mid.cgi?6E83197B-9DD5-4C7E-846D-AD176C25464D This morning I found this in my /var/log/messages: Aug 11 01:59:48 kraken kernel: MCA: Bank 4, Status 0x94614c62001c011b Aug 11 01:59:48 kraken kernel: MCA: Global Cap 0x0106, Status 0x Aug 11 01:59:48 kraken kernel: MCA: Vendor AuthenticAMD, ID 0x100f42, APIC ID 0 Aug 11 01:59:48 kraken kernel: MCA: CPU 0 COR GCACHE LG RD error Aug 11 01:59:48 kraken kernel: MCA: Address 0x5d0fe8c from /var/run/dmesg.boot Copyright (c) 1992-2010 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 8.1-STABLE #0: Sun Jul 25 19:18:56 EDT 2010 d...@kraken.example.org:/usr/obj/usr/src/sys/KRAKEN amd64 Timecounter i8254 frequency 1193182 Hz quality 0 CPU: AMD Phenom(tm) II X4 945 Processor (3010.17-MHz K8-class CPU) Origin = AuthenticAMD Id = 0x100f42 Family = 10 Model = 4 Stepping = 2 Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT Features2=0x802009SSE3,MON,CX16,POPCNT AMD Features=0xee500800SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow! AMD Features2=0x37ffLAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT TSC: P-state invariant real memory = 4294967296 (4096 MB) avail memory = 4100710400 (3910 MB) ACPI APIC Table: 111909 APIC1708 FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 Andrew: You posted about this on July 14. Anything new since then? John: Is it time for me to get a new CPU? Hmm, this is what mcelog says: HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 4 northbridge ADDR 5d0fe8c Northbridge NB Array Error bit33 = err cpu1 bit42 = L3 subcache in error bit 0 bit43 = L3 subcache in error bit 1 bit46 = corrected ecc error memory/cache error 'generic read mem transaction, generic transaction, level generic' STATUS 94614c62001c011b MCGSTATUS 0 MCGCAP 106 APICID 0 SOCKETID 0 CPUID Vendor AMD Family 16 Model 4 It was a corrected ECC error. If you get more than one then perhaps the CPU is busted, but if you only get one, an isolated bit flip may not be worth worrying about. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: real memory falsely reports 8G, BIOS avail memory reports 1G
On Monday, August 09, 2010 8:13:03 am Julian H. Stacey wrote: Hi hack...@freebsd.org A laptop here emits a puzzlingly dmesg to both 8.1-RC2 8.1-RELEASE: real memory = 8572108800 (8175 MB) avail memory = 1018789888 (971 MB) BIOS reckons it has 1G. No panel to unscrew to inspect memory. I don't beleive 8G. If this is a bug in FreeBSD detect code ? I am ready to run test kernel patches againt 8.1-RELEASE report back if anyone has code. (I have room to install a current too if necessary) Full dmesg here: http://www.berklix.com/~jhs/hardware/laptops/novatech-8355/dmesg/ Cheers, Julian Hmm, do you have dmidecode output? -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Why doesn't ppc(4) check non-ENXIO failures during probe?
On Sunday, August 15, 2010 1:33:38 am Garrett Cooper wrote: One thing that's puzzling me about the ppc(4) driver's ISA routines is that it only checks to see whether or not the device has an IO error: Your patch would break hinted ppc devices. ENXIO means that the device_t being probed has an ISA PNP ID, but it does not match any of the IDs in the list. ENONET means that the device_t does not have an ISA ID at all. For the isa bus that means it was explicitly created via a set of ppc.X hints. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Why doesn't ppc(4) check non-ENXIO failures during probe?
On Monday, August 16, 2010 7:23:54 pm Garrett Cooper wrote: On Mon, Aug 16, 2010 at 1:19 PM, John Baldwin j...@freebsd.org wrote: On Sunday, August 15, 2010 1:33:38 am Garrett Cooper wrote: One thing that's puzzling me about the ppc(4) driver's ISA routines is that it only checks to see whether or not the device has an IO error: Your patch would break hinted ppc devices. ENXIO means that the device_t being probed has an ISA PNP ID, but it does not match any of the IDs in the list. ENONET means that the device_t does not have an ISA ID at all. For the isa bus that means it was explicitly created via a set of ppc.X hints. Just clarifying some things because I don't know all of the details. If a ISA based parallel port fails to probe with ENOENT, then it's assumed that the configuration details are incorrect, and it should reprobe the device with different configuration settings (irq, isa port, etc) a max of BIOS_MAX_PPC times before it finally bails failing to configure a device (ppc_probe in ppc.c)? What if all of the ISA details in the device.hints file are bogus and the only detail that's correct is in the puc driver, etc? Would it fail to connect the card if it reached the BIOS_MAX_PPC ISA-related failure limit (see ppc_probe again)? ISA_PNP_PROBE() does not talk to the hardware, it just compares device IDs. You have to realize that device_t objects on an ISA device come from three sources: 1) Builtin devices are auto-enumerated via ACPI or PnP BIOS. Any modern BIOS will do this for things like built in serial ports, ISA timers, PS/2 keyboard, etc. 2) ISA PnP adapters in an ISA slot are enumerated via ISA PnP. 3) Users indicate that specific ISA devices are present via hints. Devices from 1) and 2) have an assigned device ID (HID) and zero-or-more compatibility IDs (CID). ISA_PNP_PROBE() accepts a list of HID IDs and returns true (0) if the HID or any of the CIDs match any of the ids in the list that is passed in. If none of the IDs match it returns ENXIO. Thus for devices from 1) and 2) ISA_PNP_PROBE() returns either 0 or ENXIO. For devices from 3), ISA_PNP_PROBE() will always return ENOENT. Your change would break 3) since those devices would then never probe. ppc_probe() is called to verify that the hardware truly exists at the resources that are claimed. In practice the loop you refer to never runs now as the default hints for ppc always specify a port and ppc adapters from 1) always include the port resource. That loop should probably belong in an identify routine instead of in the probe routine anyway. It probably predates new-bus. The waters are slightly muddied further by the fact that if the resources specified in a hint match the resources from one of the devices found via 1) or 2), the device from 1) or 2) will actually subsume the hinted device so you will not get a separate type 3) device. For example, in the default hints uart0 specifies an I/O port of 0x3f8. If ACPI tells the OS about a COM1 serial port with the default I/O port (0x3f8), then the hints cause that device to be named uart0 and to use the flags from uart0 to enable the serial console, etc. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: real memory falsely reports 8G, BIOS avail memory reports 1G
On Tuesday, August 17, 2010 7:49:22 am Julian H. Stacey wrote: John Baldwin wrote: On Monday, August 09, 2010 8:13:03 am Julian H. Stacey wrote: Hi hack...@freebsd.org A laptop here emits a puzzlingly dmesg to both 8.1-RC2 8.1-RELEASE: real memory = 8572108800 (8175 MB) avail memory = 1018789888 (971 MB) BIOS reckons it has 1G. No panel to unscrew to inspect memory. I don't beleive 8G. If this is a bug in FreeBSD detect code ? I am ready to run test kernel patches againt 8.1-RELEASE report back if anyone has code. (I have room to install a current too if necessary) Full dmesg here: http://www.berklix.com/~jhs/hardware/laptops/novatech-8355/dmesg/ Cheers, Julian Hmm, do you have dmidecode output? Hi, Thanks for interest, Yes here Yeah, I saw you post the details later in the thread and had forgotten to delete my reply. At one point the code to print out real memory was changed to use the DMI/SMBios information as when it is correct it gives a more accurate looking number (an even 8GB for example vs some number that is slightly smaller than 8GB). It looks like your DMI info is just very wrong resulting in a bogus number in the printf. However, it is only a cosmetic failure, it doesn't affect how the kernel runs or uses memory. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Why doesn't ppc(4) check non-ENXIO failures during probe?
On Tuesday, August 17, 2010 3:56:20 pm Garrett Cooper wrote: On Tue, Aug 17, 2010 at 6:07 AM, John Baldwin j...@freebsd.org wrote: On Monday, August 16, 2010 7:23:54 pm Garrett Cooper wrote: On Mon, Aug 16, 2010 at 1:19 PM, John Baldwin j...@freebsd.org wrote: On Sunday, August 15, 2010 1:33:38 am Garrett Cooper wrote: One thing that's puzzling me about the ppc(4) driver's ISA routines is that it only checks to see whether or not the device has an IO error: Your patch would break hinted ppc devices. ENXIO means that the device_t being probed has an ISA PNP ID, but it does not match any of the IDs in the list. ENONET means that the device_t does not have an ISA ID at all. For the isa bus that means it was explicitly created via a set of ppc.X hints. Just clarifying some things because I don't know all of the details. If a ISA based parallel port fails to probe with ENOENT, then it's assumed that the configuration details are incorrect, and it should reprobe the device with different configuration settings (irq, isa port, etc) a max of BIOS_MAX_PPC times before it finally bails failing to configure a device (ppc_probe in ppc.c)? What if all of the ISA details in the device.hints file are bogus and the only detail that's correct is in the puc driver, etc? Would it fail to connect the card if it reached the BIOS_MAX_PPC ISA-related failure limit (see ppc_probe again)? ISA_PNP_PROBE() does not talk to the hardware, it just compares device IDs. You have to realize that device_t objects on an ISA device come from three sources: 1) Builtin devices are auto-enumerated via ACPI or PnP BIOS. Any modern BIOS will do this for things like built in serial ports, ISA timers, PS/2 keyboard, etc. 2) ISA PnP adapters in an ISA slot are enumerated via ISA PnP. 3) Users indicate that specific ISA devices are present via hints. Devices from 1) and 2) have an assigned device ID (HID) and zero-or-more compatibility IDs (CID). ISA_PNP_PROBE() accepts a list of HID IDs and returns true (0) if the HID or any of the CIDs match any of the ids in the list that is passed in. If none of the IDs match it returns ENXIO. Thus for devices from 1) and 2) ISA_PNP_PROBE() returns either 0 or ENXIO. For devices from 3), ISA_PNP_PROBE() will always return ENOENT. Your change would break 3) since those devices would then never probe. ppc_probe() is called to verify that the hardware truly exists at the resources that are claimed. In practice the loop you refer to never runs now as the default hints for ppc always specify a port and ppc adapters from 1) always include the port resource. That loop should probably belong in an identify routine instead of in the probe routine anyway. It probably predates new-bus. The waters are slightly muddied further by the fact that if the resources specified in a hint match the resources from one of the devices found via 1) or 2), the device from 1) or 2) will actually subsume the hinted device so you will not get a separate type 3) device. For example, in the default hints uart0 specifies an I/O port of 0x3f8. If ACPI tells the OS about a COM1 serial port with the default I/O port (0x3f8), then the hints cause that device to be named uart0 and to use the flags from uart0 to enable the serial console, etc. So more or less it's for BIOSes with ISA that doesn't feature plug and play (286s, 386s, some 486s?)? Just trying to fill in the gap :). Yes, it may perhaps still be useful for some x86 embedded systems, though it is doubtful that those would use a ppc(4) device perhaps. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Modules and Buses
On Thursday, August 19, 2010 8:38:05 am Alexandr Rybalko wrote: Hi all, Can someone say, how `make` in sys/modules dir can obtain available buses. I try to make clean version of bfe, that can be for PCI bus or can be part of SoC (like BCM5354) on SSB bus. So for proper module building I need to know what bus interface I must build if_bfe_pci.c, or if_bfe_siba.c, or both? You can always include both buses. If a bus driver isn't present in the kernel the attachment will just never be invoked. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Converting from jiffies to ticks
On Friday, August 20, 2010 9:14:23 am Jesse Smith wrote: I am currently trying to port a program from Linux to FreeBSD which detects how much processor time a process is using. The native Linux code does this (in part) by reading the number of jiffies a given process uses. This info is pulled from the /proc/PID/stat file. One function is failing on FreeBSD and it's obviously because FreeBSD does not have all the same files/data in the /proc directory. I've looked around and, as I understand it, FreeBSD uses ticks instead of jiffies to measure process usage. However, how to gather that data is a bit lost on me. This raises a question for me: Where can I find the equivalent information on FreeBSD? I assume there's a function call. Maybe in the kvm_* family? I need to be able to get the number of ticks a given PID is using, both in the kernel and userspace. The rest of the program measures everything in jiffies, so it would be ideal for me to get the ticks used on FreeBSD (based on PID), convert it to jiffies and pass it back to the main program. FreeBSD saves the total runtime in an architecture-dependent ticker count that is separate from ticks. (ticks tends to run at hz, so by default 1000 times per second, where as the 'ticker' on x86 is the TSC which runs at the clock speed of the CPU (throttling and turbo boost aside)). You can look at the calcru() function to see how the kernel converts the runtime ticker count (saved in rux_runtime) into microseconds. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Question about printcpuinfo in sys/amd64/amd64/indentcpu.c
On Friday, August 20, 2010 10:14:46 am Garrett Cooper wrote: Hi, Currently the code in identcpu.c does a check for a specific cpu value extension. This is set to 0x8004 (even though the corresponding code below iterates through 0x8002:0x8005): It does not invoke 0x8005 (, not =, is used as the loop terminator). /* Check for extended CPUID information and a processor name. */ if (cpu_exthigh = 0x8004) { brand = cpu_brand; for (i = 0x8002; i 0x8005; i++) { do_cpuid(i, regs); memcpy(brand, regs, sizeof(regs)); brand += sizeof(regs); } } -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: kld modules remain loaded if MOD_LOAD handler returns an error
On Friday, August 20, 2010 1:13:53 pm Ryan Stone wrote: Consider the following modules: /* first.c */ static int *test; int test_function(void) { return *test; } static int first_modevent(struct module *m, int what, void *arg) { int err = 0; switch (what) { case MOD_LOAD:/* kldload */ test = malloc(sizeof(int), M_TEMP, M_NOWAIT | M_ZERO); if (!test) err = ENOMEM; break; case MOD_UNLOAD: /* kldunload */ break; default: err = EINVAL; break; } return(err); } static moduledata_t first_mod = { first, first_modevent, NULL }; DECLARE_MODULE(first, first_mod, SI_SUB_KLD, SI_ORDER_ANY); MODULE_VERSION(first, 1); /* second.c */ static int second_modevent(struct module *m, int what, void *arg) { int err = 0; switch (what) { case MOD_LOAD:/* kldload */ test_function(); break; case MOD_UNLOAD: /* kldunload */ break; default: err = EINVAL; break; } return(err); } static moduledata_t second_mod = { second, second_modevent, NULL }; DECLARE_MODULE(second, second_mod, SI_SUB_KLD, SI_ORDER_ANY); MODULE_DEPEND(second, first, 1, 1, 1); Consider the case where malloc fails in first_modevent. first_modevent will return ENOMEM, but the module will remain loaded. Now when the second module goes and loads, it calls into the first module, which is not initialized properly, and promptly crashes when test_function() dereferences a null pointer. It seems to me that a module should be unloaded if it returns an error from its MOD_LOAD handler. However, that's easier said than done. The MOD_LOAD handler is called from a SYSINIT, and there's no immediately obvious way to pass information about the failure from the SYSINIT to the kernel linker. Anybody have any thoughts on this? Yeah, it's not easy to fix. Probably we could patch the kernel linker to notice if any of the modules for a given linker file had errors during initialization and trigger an unload if that occurs. I don't think this would be too hard to do. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: kld modules remain loaded if MOD_LOAD handler returns an error
On Monday, August 23, 2010 11:04:20 am Andriy Gapon wrote: on 23/08/2010 15:10 John Baldwin said the following: On Friday, August 20, 2010 1:13:53 pm Ryan Stone wrote: Consider the following modules: /* first.c */ static int *test; int test_function(void) { return *test; } static int first_modevent(struct module *m, int what, void *arg) { int err = 0; switch (what) { case MOD_LOAD:/* kldload */ test = malloc(sizeof(int), M_TEMP, M_NOWAIT | M_ZERO); if (!test) err = ENOMEM; break; case MOD_UNLOAD: /* kldunload */ break; default: err = EINVAL; break; } return(err); } static moduledata_t first_mod = { first, first_modevent, NULL }; DECLARE_MODULE(first, first_mod, SI_SUB_KLD, SI_ORDER_ANY); MODULE_VERSION(first, 1); /* second.c */ static int second_modevent(struct module *m, int what, void *arg) { int err = 0; switch (what) { case MOD_LOAD:/* kldload */ test_function(); break; case MOD_UNLOAD: /* kldunload */ break; default: err = EINVAL; break; } return(err); } static moduledata_t second_mod = { second, second_modevent, NULL }; DECLARE_MODULE(second, second_mod, SI_SUB_KLD, SI_ORDER_ANY); MODULE_DEPEND(second, first, 1, 1, 1); Consider the case where malloc fails in first_modevent. first_modevent will return ENOMEM, but the module will remain loaded. Now when the second module goes and loads, it calls into the first module, which is not initialized properly, and promptly crashes when test_function() dereferences a null pointer. It seems to me that a module should be unloaded if it returns an error from its MOD_LOAD handler. However, that's easier said than done. The MOD_LOAD handler is called from a SYSINIT, and there's no immediately obvious way to pass information about the failure from the SYSINIT to the kernel linker. Anybody have any thoughts on this? Yeah, it's not easy to fix. Probably we could patch the kernel linker to notice if any of the modules for a given linker file had errors during initialization and trigger an unload if that occurs. I don't think this would be too hard to do. John, please note that for this testcase we would need to prevent second module's modevent from being executed at all. Perhaps a module shouldn't be considered as loaded until modevent caller marks it as successfully initialized, but I haven't looked at the actual code. Well, if these two event handlers are in the same module, I think that is a bug in the module really. I tend to collapse such things down to a single event handler per kld just so I can really get the ordering correct anyway. :) If they are in two separate .ko files then the other solution would work. We could also hack the module code to mark a linker_file as 'broken' and have the module_helper sysinit not call mod_load if the containing file is 'broken', etc. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Why doesn't ppc(4) check non-ENXIO failures during probe?
On Tuesday, August 24, 2010 12:09:45 am M. Warner Losh wrote: In message: 201008171615.21103@freebsd.org John Baldwin j...@freebsd.org writes: : So more or less it's for BIOSes with ISA that doesn't feature plug : and play (286s, 386s, some 486s?)? Just trying to fill in the gap :). : : Yes, it may perhaps still be useful for some x86 embedded systems, though : it is doubtful that those would use a ppc(4) device perhaps. Many embedded x86 systems use ppc(4) as a DIO port. ppi attaches to it and can be used to frob bits. These days, of course, almost all boards have ACPI, so that means they get enumerated that way. Only boards that don't run windows might not have ACPI, in which case the devices are usually enumerated via PNPBIOS. But not always, since those boards tend to have the buggiest BIOSes on the planet in this area. Hints are needed on a few of these boards since nothing else will work. And they have Atom processors on them... The specific code I am referring to is the code in ppc_isa_probe() that tries to auto-identify a ppc port by poking at various I/O ports directly. It is not enabled by default. You'd have to have a ppc hint that did not include an I/O port for this code to be triggered I think as it only gets executed if a ppc(4) device does not have an I/O port resource from ACPI/PnPBIOS/hints. I was mostly thinking of this in terms of ISA cards, and I doubt that even modern embedded systems have ISA slots. :) -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: disassembler
On Thursday, August 26, 2010 11:42:25 pm Aryeh Friedman wrote: On Thu, Aug 26, 2010 at 11:36 PM, Aryeh Friedman aryeh.fried...@gmail.com wrote: On Thu, Aug 26, 2010 at 10:46 PM, Dirk Engling erdge...@erdgeist.org wrote: On 27.08.10 04:17, Aryeh Friedman wrote: Is there a disassembler in the base system if not what is a good option from ports? Try objdump -d, erdgeist flosoft# objdump -d /dev/da0 objdump: Warning: '/dev/da0' is not an ordinary file For a raw file of x86 instructions use ndisasm from the 'nasm' port. Note that it assumes 16-bit code by default, but you can use ndisasm -U to parse 32-bit instructions instead. For a typical MBR boot loader, plain ndisasm should work fine: # ndisasm /dev/twed0 FCcld 0001 31C0 xor ax,ax 0003 8EC0 mov es,ax 0005 8ED8 mov ds,ax 0007 8ED0 mov ss,ax 0009 BC007Cmov sp,0x7c00 000C BE1A7Cmov si,0x7c1a 000F BF1A06mov di,0x61a 0012 B9E601mov cx,0x1e6 0015 F3A4 rep movsb 0017 E9008Ajmp word 0x8a1a 001A 31F6 xor si,si ... etc. I would dd the first sector of your disk off to a file and run ndisasm on that though rather than on the live disk. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Debugging Loadable Modules Using GDB
On Friday, August 27, 2010 4:11:41 pm Alexander Fiveg wrote: Hi, from FreeBSD Developers' Handbook, 10.7 Debugging Loadable Modules Using GDB: ... (kgdb) add-symbol-file /sys/modules/linux/linux.ko 0xc0ae22d0 ... Actually I couldn't debug my modules using .ko-file. Moreover, I've find out that .ko files do not contain sections with debugging info. With .kld-file debugging works out. Do I something incorrectly or the info in the Developers Book is outdated? With newer versions of kgdb you shouldn't need to manually invoke 'add-symbol- file'. Kernel modules are treated as shared libraries and should automatically be loaded. Try using 'info sharedlibrary' to see the list of kernel modules and if symbols for them are loaded already. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: System freezes unexpectly
On Sunday, August 29, 2010 10:18:48 am Davide Italiano wrote: Hi. I'm running 8.1 on my Sony Vaio laptop, with dwm as window manager on lastest Xorg on ports. When I'm trying to run firefox3, the system freezes unexpectly. I know that freezes is a bit generic but I can't find a more specific term to describe the situation. Dmesg doesn't give useful infos. I installed firefox using pkg_add -r , the only add-on/plugin installed is Xmarks. I'm ready to eventually debug, any suggestion is apprectiated. Thanks Can you ssh into the machine or ping it when it is frozen? -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Debugging Loadable Modules Using GDB
On Monday, August 30, 2010 12:12:50 pm Alexander Fiveg wrote: On Mon, Aug 30, 2010 at 08:16:11AM -0400, John Baldwin wrote: On Friday, August 27, 2010 4:11:41 pm Alexander Fiveg wrote: Hi, from FreeBSD Developers' Handbook, 10.7 Debugging Loadable Modules Using GDB: ... (kgdb) add-symbol-file /sys/modules/linux/linux.ko 0xc0ae22d0 ... Actually I couldn't debug my modules using .ko-file. Moreover, I've find out that .ko files do not contain sections with debugging info. With .kld-file debugging works out. Do I something incorrectly or the info in the Developers Book is outdated? With newer versions of kgdb you shouldn't need to manually invoke 'add-symbol- file'. Kernel modules are treated as shared libraries and should automatically be loaded. Try using 'info sharedlibrary' to see the list of kernel modules and if symbols for them are loaded already. Yes, the .ko files are loaded automatically. The problem is that they do not contain debugging info. I have always to load the .kld file in order to debug a module: (kgdb) f 9 #9 0xc4dc558b in rm_8254_delayed_interrupt_per_packet () from /boot/kernel/if_ringmap.ko (kgdb) info locals No symbol table info available. (kgdb) add-symbol-file /home/alexandre/p4/ringmap/current/sys/modules/ringmap/if_ringmap.kld 0xc4dafc70 add symbol table from file /home/alexandre/p4/ringmap/current/sys/modules/ringmap/if_ringmap.kld at .text_addr = 0xc4dafc70 (y or n) y Reading symbols from /home/alexandre/p4/ringmap/current/sys/modules/ringmap/if_ringmap.kld...done. (kgdb) f 9 #9 0xc4dc558b in rm_8254_delayed_interrupt_per_packet () at /home/alexandre/p4/ringmap/current/sys/modules/ringmap/../../dev/e1000/ringmap_8254.c:142 142 co-ring-slot[slot_num].ts = co-ring-last_ts; (kgdb) info locals co = (struct capt_object *) 0xc4d68380 adapter = (struct adapter *) 0xc4e77000 __func__ = e\000\000�\...@\000\000\211\203�e\000\000\017\206b\022\000\000\2039\000\213a\004\017\205�\f\000\000\001��1� Is there any way to get the all symbols and needed debug info without loading the .kld file ? How are you compiling the kld? If you are building it by hand, use 'make DEBUG_FLAGS=-g' when you build and install the kld. That should build with debug symbols enabled and install the ko.symbols file which kgdb will find and use. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Debugging Loadable Modules Using GDB
On Monday, August 30, 2010 4:34:04 pm Alexander Fiveg wrote: On Mon, Aug 30, 2010 at 01:10:37PM -0400, John Baldwin wrote: On Monday, August 30, 2010 12:12:50 pm Alexander Fiveg wrote: On Mon, Aug 30, 2010 at 08:16:11AM -0400, John Baldwin wrote: On Friday, August 27, 2010 4:11:41 pm Alexander Fiveg wrote: Hi, from FreeBSD Developers' Handbook, 10.7 Debugging Loadable Modules Using GDB: ... (kgdb) add-symbol-file /sys/modules/linux/linux.ko 0xc0ae22d0 ... Actually I couldn't debug my modules using .ko-file. Moreover, I've find out that .ko files do not contain sections with debugging info. With .kld-file debugging works out. Do I something incorrectly or the info in the Developers Book is outdated? With newer versions of kgdb you shouldn't need to manually invoke 'add-symbol- file'. Kernel modules are treated as shared libraries and should automatically be loaded. Try using 'info sharedlibrary' to see the list of kernel modules and if symbols for them are loaded already. Yes, the .ko files are loaded automatically. The problem is that they do not contain debugging info. I have always to load the .kld file in order to debug a module: (kgdb) f 9 #9 0xc4dc558b in rm_8254_delayed_interrupt_per_packet () from /boot/kernel/if_ringmap.ko (kgdb) info locals No symbol table info available. (kgdb) add-symbol-file /home/alexandre/p4/ringmap/current/sys/modules/ringmap/if_ringmap.kld 0xc4dafc70 add symbol table from file /home/alexandre/p4/ringmap/current/sys/modules/ringmap/if_ringmap.kld at .text_addr = 0xc4dafc70 (y or n) y Reading symbols from /home/alexandre/p4/ringmap/current/sys/modules/ringmap/if_ringmap.kld...done. (kgdb) f 9 #9 0xc4dc558b in rm_8254_delayed_interrupt_per_packet () at /home/alexandre/p4/ringmap/current/sys/modules/ringmap/../../dev/e1000/ringmap_8254.c:142 142 co-ring-slot[slot_num].ts = co-ring-last_ts; (kgdb) info locals co = (struct capt_object *) 0xc4d68380 adapter = (struct adapter *) 0xc4e77000 __func__ = e\000\000�\...@\000\000\211\203�e\000\000\017\206b\022\000\000\2039\000\213a\004\017\205�\f\000\000\001��1� Is there any way to get the all symbols and needed debug info without loading the .kld file ? How are you compiling the kld? If you are building it by hand, use 'make DEBUG_FLAGS=-g' when you build and install the kld. That should build with debug symbols enabled and install the ko.symbols file which kgdb will find and use. Thanks a lot!. That is what I want to know. But I think this option is not mentioned anywhere. I could not find it in man make make.conf and also no mention about it in FreeBSD Developers' Handbook. It's a bit of a feature of the bsd.*.mk files that if you define 'DEBUG_FLAGS' it is added to CFLAGS (and CXXFLAGS) and that any resulting binaries are not stripped, etc. The same trick can be used to build debug versions of binaries and libraries. It probably is underdocumented. Not sure make.conf(5) is the right place as the typical usage is on the command line, not in /etc/make.conf or /etc/src.conf. However, I can't think of a better place. Maybe src.conf(5)? -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: System freezes unexpectly
On Monday, August 30, 2010 12:45:40 pm Garrett Cooper wrote: On Mon, Aug 30, 2010 at 9:24 AM, Davide Italiano davide.itali...@gmail.com wrote: removing ~/.mozilla works fine. I think that problem's related to add-on Xmarks I've been installer or to Restore session functionality It would have been interesting to capture what `froze' the machine, in particular because it could have been a valuable bug for either Mozilla to capture and fix, or for us to capture and fix. Unless your machine doesn't meet the hardware requirements, I don't see a reason why a userland application should lock up a system. There are other ways you can debug this further, using -safe-mode as a next step, then choose to not restore the last session (which is available from within the javascript settings file -- nsPrefs.js?). If only firefox is frozen, then you can always ssh in from another machine and use top/ps, etc., or even gdb on the firefox process itself. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: MCE Decoding - MCA: Bank 8, Status 0xcc0031800001009f/0xc8000980000200cf
On Saturday, September 11, 2010 1:40:28 am Simon wrote: Hello, Can someone please help me decode these two errors on FreeBSD 8.1-R: MCA: Bank 8, Status 0xcc003181009f MCA: Global Cap 0x1c09, Status 0x MCA: Vendor GenuineIntel, ID 0x106a5, APIC ID 16 MCA: CPU 0 COR (198) OVER RD channel ?? memory error MCA: Address 0x1b6188d80 MCA: Misc 0x72ae24200084 MCA: Bank 8, Status 0xc800098200cf MCA: Global Cap 0x1c09, Status 0x MCA: Vendor GenuineIntel, ID 0x106a5, APIC ID 16 MCA: CPU 0 COR (38) OVER MS channel ?? memory error MCA: Misc 0x72ae24200140 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 BANK 8 MISC 72ae24200084 ADDR 1b6188d80 MCG status: MCi status: Error overflow MCi_MISC register valid MCi_ADDR register valid MCA: MEMORY CONTROLLER RD_CHANNELunspecified_ERR Transaction: Memory read error Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 198 Memory transaction Tracker ID (RTId): 84 Memory DIMM ID of error: 0 Memory channel ID of error: 0 Memory ECC syndrome: 72ae2420 STATUS cc003181009f MCGSTATUS 0 MCGCAP 1c09 APICID 10 SOCKETID 0 CPUID Vendor Intel Family 6 Model 26 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 0 BANK 8 MISC 72ae24200140 MCG status: MCi status: Error overflow MCi_MISC register valid MCA: MEMORY CONTROLLER MS_CHANNELunspecified_ERR Transaction: Memory scrubbing error Memory ECC error occurred during scrub Memory corrected error count (CORE_ERR_CNT): 38 Memory transaction Tracker ID (RTId): 40 Memory DIMM ID of error: 0 Memory channel ID of error: 0 Memory ECC syndrome: 72ae2420 STATUS c800098200cf MCGSTATUS 0 MCGCAP 1c09 APICID 10 SOCKETID 0 CPUID Vendor Intel Family 6 Model 26 You have some corrected memory errors (198+38 = 236) in the first DIMM (on the SuperMicro boards we have at work, it would correspond to the DIMM slot labeled P1_DIMM1A). In my experience I would just ignore them unless the count gets much higher (say 1+ / per hour). -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: is vfs.lookup_shared unsafe in 7.3?
On Monday, September 13, 2010 4:57:15 pm cronfy wrote: Hello, Trying to overtake high server load (sudden peaks of 15%us/85%sy, LA 40, very slow lstat() at these moments, looks like some kind of lock contention) I enabled vfs.lookup_shared=1 on two servers today. One is FreeBSD-7.3 kernel csup'ed and built Sep 9 2010 and other is FreeBSD-7.3 csup'ed and built Jul 16 2010. The server with more fresh kernel is running nice and does not show high load anymore. But on the second server it did not help. More, after a few hours of work with vfs.lookup_shared=1 I noticed processes stucked in ufs state. I tried to kill them with no luck. Disabling vfs.lookup_shared freezed the whole system. So, is vfs.lookup_shared=1 unsafe in 7.3? Did it become more stable between 16 Jul and 9 Sep (is it the reason why first system is still running?), or should I expect that it will freeze in a near time too? Thanks in advance! No, 7.3 has a bug that can cause these hangs that is probably made worse by vfs.lookup_shared=1, but can occur even if it is disabled. You want these fixes applied (in order, one of them reverts part of another): Author: jhb Date: Fri Jul 16 20:23:24 2010 New Revision: 210173 URL: http://svn.freebsd.org/changeset/base/210173 Log: When the MNTK_EXTENDED_SHARED mount option was added, some filesystems were changed to defer the setting of VN_LOCK_ASHARE() (which clears LK_NOSHARE in the vnode lock's flags) until after they had determined if the vnode was a FIFO. This occurs after the vnode has been inserted into a VFS hash or some similar table, so it is possible for another thread to find this vnode via vget() on an i-node number and block on the vnode lock. If the lockmgr interlock (vnode interlock for vnode locks) is not held when clearing the LK_NOSHARE flag, then the lk_flags field can be clobbered. As a result the thread blocked on the vnode lock may never get woken up. Fix this by holding the vnode interlock while modifying the lock flags in this case. The softupdates code also toggles LK_NOSHARE in one function to close a race with snapshots. Fix this code to grab the interlock while fiddling with lk_flags. Modified: stable/7/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c stable/7/sys/fs/cd9660/cd9660_vfsops.c stable/7/sys/fs/udf/udf_vfsops.c stable/7/sys/ufs/ffs/ffs_softdep.c stable/7/sys/ufs/ffs/ffs_vfsops.c Author: jhb Date: Fri Aug 20 20:33:13 2010 New Revision: 211532 URL: http://svn.freebsd.org/changeset/base/211532 Log: MFC: Use VN_LOCK_AREC() and VN_LOCK_ASHARE() rather than manipulating lockmgr lock flags directly. Modified: stable/7/sys/fs/nwfs/nwfs_node.c stable/7/sys/fs/pseudofs/pseudofs_vncache.c stable/7/sys/fs/smbfs/smbfs_node.c stable/7/sys/gnu/fs/xfs/FreeBSD/xfs_freebsd_iget.c stable/7/sys/kern/vfs_lookup.c Author: jhb Date: Fri Aug 20 20:58:57 2010 New Revision: 211533 URL: http://svn.freebsd.org/changeset/base/211533 Log: Revert 210173 as it did not properly fix the bug. It assumed that the VI_LOCK() for a given vnode was used as the internal interlock for that vnode's v_lock lockmgr lock. This is not the case. Instead, add dedicated routines to toggle the LK_NOSHARE and LK_CANRECURSE flags. These routines lock the lockmgr lock's internal interlock to synchronize the updates to the flags member with other threads attempting to acquire the lock. The VN_LOCK_A*() macros now invoke these routines, and the softupdates code uses these routines to temporarly enable recursion on buffer locks. Reviewed by: kib Modified: stable/7/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c stable/7/sys/fs/cd9660/cd9660_vfsops.c stable/7/sys/fs/udf/udf_vfsops.c stable/7/sys/kern/kern_lock.c stable/7/sys/sys/lockmgr.h stable/7/sys/sys/vnode.h stable/7/sys/ufs/ffs/ffs_softdep.c stable/7/sys/ufs/ffs/ffs_vfsops.c -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Questions about mutex implementation in kern/kern_mutex.c
there is no any memory barrier in mtx_init()? If another thread (on another CPU) finds that mutex is initialized using mtx_initialized() then it can mtx_lock() it and mtx_lock() it second time, as a result mtx_recurse field will be increased, but its value still can be uninitialized on architecture with relaxed memory ordering model. It seems to me that it's generally a programming error to rely on the return of mtx_initialized(), as there is no serialization with e.g. a thread calling mtx_destroy(). A fully correct serialization model would require that a single thread initialize the mtx and then create any worker threads that will use the mtx. Yes, it is the caller's job to not expose a mtx until after it has been initialized. A memory barrier in mtx_init() can't solve all those races. If you put an object containing a mutex on a global queue and only invoke mtx_init() after dropping the global lock protecting the global queue, no amount of memory barriers in mtx_init() will save you. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: is vfs.lookup_shared unsafe in 7.3?
On Thursday, September 16, 2010 3:53:47 am cronfy wrote: Hello, Trying to overtake high server load (sudden peaks of 15%us/85%sy, LA 40, very slow lstat() at these moments, looks like some kind of lock contention) I enabled vfs.lookup_shared=1 on two servers today. One is FreeBSD-7.3 kernel csup'ed and built Sep 9 2010 and other is FreeBSD-7.3 csup'ed and built Jul 16 2010. The server with more fresh kernel is running nice and does not show high load anymore. But on the second server it did not help. More, after a few hours of work with vfs.lookup_shared=1 I noticed processes stucked in ufs state. I tried to kill them with no luck. Disabling vfs.lookup_shared freezed the whole system. So, is vfs.lookup_shared=1 unsafe in 7.3? Did it become more stable between 16 Jul and 9 Sep (is it the reason why first system is still running?), or should I expect that it will freeze in a near time too? Thanks in advance! No, 7.3 has a bug that can cause these hangs that is probably made worse by vfs.lookup_shared=1, but can occur even if it is disabled. You want these fixes applied (in order, one of them reverts part of another): Thank you for the fix and for the explanation, that's exactly what I wanted to know. Just to be sure: do these patches completely fix the bug with hangs (even without vfs.lookup_shared=1)? Yes. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: odd issues with DDB vs GDB
On Wednesday, September 15, 2010 8:01:19 pm Patrick Mahan wrote: All, I am trying to debug a system hang occurring on my HP Proliant G6 running some of our kernel software. I am seeing that under certain test loads, the system will hang-up complete, no keyboard, no console, etc. I suspect it is some of the kernel code that I have inherited that contains a lot of locking (lots of data structure, each having their own mutex lock (sleepable)). You need to use 'kgdb' rather than 'gdb' on kernel.debug. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: traling whitespace in CFLAGS if make.conf:CPUTYPE is not defined/empty
On Wednesday, September 15, 2010 9:01:20 pm Alexander Best wrote: hi there, after discovering PR #114082 i noticed that with CPUTYPE not being defined in make.conf, `make -VCFLAGS` reports a trailing whitespace for CFLAGS. the reason for this is that ${_CPUCFLAGS} gets added to CFLAGS even if it's empty. the following patch should take care of the problem. i also added the same logik to COPTFLAGS. although i wasn't able to trigger the trailing whitespace, it should still introduce a cleaner behaviour. Does the trailing whitespace break anything? In the past we have had a non-empty default CPU CFLAGS (e.g. using '-mtune=pentiumpro' on i386 at one point IIRC) which this change would break. Unless the trailing whitespace is causing non-cosmetic problems I'd probably just leave it as it is. Also, if we were to go with this approach, I would not have changed kern.pre.mk at all, but set both NO_CPU_CFLAGS and NO_CPU_COPTFLAGS in bsd.cpu.mk when CPUTYPE was empty. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Questions about mutex implementation in kern/kern_mutex.c
On Thursday, September 16, 2010 1:33:07 pm Andrey Simonenko wrote: On Wed, Sep 15, 2010 at 08:46:00AM -0700, Matthew Fleming wrote: I'll take a stab at answering these... On Wed, Sep 15, 2010 at 6:44 AM, Andrey Simonenko si...@comsys.ntu-kpi.kiev.ua wrote: Hello, I have questions about mutex implementation in kern/kern_mutex.c and sys/mutex.h files (current versions of these files): 1. Is the following statement correct for a volatile pointer or integer variable: if a volatile variable is updated by the compare-and-set instruction (e.g. atomic_cmpset_ptr(val, ...)), then the current value of such variable can be read without any special instruction (e.g. v = val)? I checked Assembler code for a function with v = val and val = v like statements generated for volatile variable and simple variable and found differences: on ia64 v = val was implemented by ld.acq and val = v was implemented by st.rel; on mips and sparc64 Assembler code can have different order of lines for volatile and simple variable (depends on the code of a function). I think this depends somewhat on the hardware and what you mean by current value. Current value means that the value of a variable read by one thread is equal to the value of this variable successfully updated by another thread by the compare-and-set instruction. As I understand from the kernel source code, atomic_cmpset_ptr() allows to update a variable in a way that all other CPUs will invalidate corresponding cache lines that contain the value of this variable. That is not true. It is likely true on x86, but it is certainly not true on other architectures such as sparc64 where a write may be held in a store buffer for an indeterminate amount of time (and note that some lock releases are simple stores with a rel memory barrier). All that we require is that if the value is stale, the atomic_cmpset() that attempts to set MTX_CONTESTED will fail. The mtx_owned(9) macro uses this property, mtx_owned() does not use anything special to compare the value of m-mtx_lock (volatile) with current thread pointer, all other functions that update m-mtx_lock of unowned mutex use compare-and-set instruction. Also I cannot find anything special in generated Assembler code for volatile variables (except for ia64 where acquire loads and release stores are used). No, mtx_owned() is just not harmed by the races it loses. You can certainly read a stale value of mtx_lock in mtx_owned() if some other thread owns the lock or has just released the lock. However, we don't care, because in both of those cases, mtx_owned() returns false. What does matter is that mtx_owned() can only return true if we currently hold the mutex. This works because 1) the same thread cannot call mtx_unlock() and mtx_owned() at the same time, and 2) even CPUs that hold writes in store buffers will snoop their store buffer for local reads on that CPU. That is, a given CPU will never read a stale value of a memory word that is older than a write it has performed to that word. If you want a value that is not in-flux, then something like atomic_cmpset_ptr() setting to the current value is needed, so that you force any other atomic_cmpset to fail. However, since there is no explicit lock involved, there is no strong meaning for current value and a read that does not rely on a value cached in a register is likely sufficient. While the volatile keyword in C has no explicit hardware meaning, it often means that a load from memory (or, presumably, L1-L3 cache) is required. The volatile keyword here and all questions are related to the base C compiler, current version and currently supported architectures in FreeBSD. Yes, here under volatile I want to say that the value of a variable is not cached in a register and it is referenced by its address in all commands. There are some places in the kernel where a variable is updated in something like do { v = value; } while (!atomic_cmpset_int(value, ...)); and that variable is not volatile, but the compiler generates correct Assembler code. So volatile is not a requirement for all cases. Hmm, I suspect that many of those places actually do use volatile. The various lock cookies (mtx_lock, etc.) are declared volatile in the structure. Otherwise the compiler would be free to conclude that 'v = value;' is a loop invariant and move it out of the loop which would break. Given that, the construct you referred to does in fact require 'value' to be volatile. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Questions about mutex implementation in kern/kern_mutex.c
On Thursday, September 16, 2010 11:24:29 pm Benjamin Kaduk wrote: On Thu, 16 Sep 2010, John Baldwin wrote: On Thursday, September 16, 2010 1:33:07 pm Andrey Simonenko wrote: The mtx_owned(9) macro uses this property, mtx_owned() does not use anything special to compare the value of m-mtx_lock (volatile) with current thread pointer, all other functions that update m-mtx_lock of unowned mutex use compare-and-set instruction. Also I cannot find anything special in generated Assembler code for volatile variables (except for ia64 where acquire loads and release stores are used). No, mtx_owned() is just not harmed by the races it loses. You can certainly read a stale value of mtx_lock in mtx_owned() if some other thread owns the lock or has just released the lock. However, we don't care, because in both of those cases, mtx_owned() returns false. What does matter is that mtx_owned() can only return true if we currently hold the mutex. This works because 1) the same thread cannot call mtx_unlock() and mtx_owned() at the same time, and 2) even CPUs that hold writes in store buffers will snoop their store buffer for local reads on that CPU. That is, a given CPU will never read a stale value of a memory word that is older than a write it has performed to that word. Sorry for the naive question, but would you mind expounding a bit on what keeps the thread from migrating to a different CPU and getting a stale value there? (I can imagine a couple possible mechanisms, but don't know enough to know which one(s) are the real ones.) The memory barriers in the thread_lock() / thread_unlock() pair of a context switch ensure that any writes posted by the thread before it performs a context switch will be visible on the new CPU before the thread resumes execution. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Questions about mutex implementation in kern/kern_mutex.c
On Friday, September 17, 2010 1:42:44 pm Andrey Simonenko wrote: On Thu, Sep 16, 2010 at 02:16:05PM -0400, John Baldwin wrote: On Thursday, September 16, 2010 1:33:07 pm Andrey Simonenko wrote: The mtx_owned(9) macro uses this property, mtx_owned() does not use anything special to compare the value of m-mtx_lock (volatile) with current thread pointer, all other functions that update m-mtx_lock of unowned mutex use compare-and-set instruction. Also I cannot find anything special in generated Assembler code for volatile variables (except for ia64 where acquire loads and release stores are used). No, mtx_owned() is just not harmed by the races it loses. You can certainly read a stale value of mtx_lock in mtx_owned() if some other thread owns the lock or has just released the lock. However, we don't care, because in both of those cases, mtx_owned() returns false. What does matter is that mtx_owned() can only return true if we currently hold the mutex. This works because 1) the same thread cannot call mtx_unlock() and mtx_owned() at the same time, and 2) even CPUs that hold writes in store buffers will snoop their store buffer for local reads on that CPU. That is, a given CPU will never read a stale value of a memory word that is older than a write it has performed to that word. Looks like I understand the logic why mtx_owned() works correctly when mtx_lock is present in CPU cache or is absent in CPU cache. The mtx_lock value definitely can say whether lock is held by the current thread, but it cannot say whether it is unowned or is owned by another thread. Let me ask another one question about memory barriers and thread migration. Let a thread locked a mutex, modified shared data protected by this mutex and was migrated from CPU1 to CPU2 (mutex is still locked). In this scenario just migrated thread will not see stale data for a mutex itself (the m-mtx_lock value) and for shared data on CPU2 because when it was migrated from CPU1 there was at least one unlock call for some another mutex that had release semantics and appropriate memory barrier instruction was run implicitly or explicitly. As a result this rel memory barrier made all modifications from CPU1 visible on another CPUs. When CPU2 switched to just migrated thread there was at least on lock call for some another mutex with acquire semantics, so rel/acq memory barriers pair works here together. (Also I consider case when CPU2 did not work with that mutex, but worked with its memory before. Some thread on CPU2 could allocate some memory, worked with it and freed it. Later the same part of memory was allocated by a thread on CPU1 for mutex). Is the above written description correct? Yes. There are some places in the kernel where a variable is updated in something like do { v = value; } while (!atomic_cmpset_int(value, ...)); and that variable is not volatile, but the compiler generates correct Assembler code. So volatile is not a requirement for all cases. Hmm, I suspect that many of those places actually do use volatile. The various lock cookies (mtx_lock, etc.) are declared volatile in the structure. Otherwise the compiler would be free to conclude that 'v = value;' is a loop invariant and move it out of the loop which would break. Given that, the construct you referred to does in fact require 'value' to be volatile. I checked Assembler code for these functions: kern/subr_msgbuf.c:msgbuf_addchar() vm/vm_map.c:vmspace_free() They may happen to accidentally work because atomic_cmpset() clobbers all of memory, but these should be marked volatile. Index: vm/vm_map.c === --- vm/vm_map.c (revision 212801) +++ vm/vm_map.c (working copy) @@ -343,10 +343,7 @@ if (vm-vm_refcnt == 0) panic(vmspace_free: attempt to free already freed vmspace); - do - refcnt = vm-vm_refcnt; - while (!atomic_cmpset_int(vm-vm_refcnt, refcnt, refcnt - 1)); - if (refcnt == 1) + if (atomic_fetchadd_int(vm-vm_refcnt, -1) == 1) vmspace_dofree(vm); } Index: vm/vm_map.h === --- vm/vm_map.h (revision 212801) +++ vm/vm_map.h (working copy) @@ -237,7 +237,7 @@ caddr_t vm_taddr; /* (c) user virtual address of text */ caddr_t vm_daddr; /* (c) user virtual address of data */ caddr_t vm_maxsaddr;/* user VA at max stack growth */ - int vm_refcnt; /* number of references */ + volatile int vm_refcnt; /* number of references */ /* * Keep the PMAP last, so that CPU-specific variations of that * structure on a single architecture don't result in offset Index: sys/msgbuf.h === --- sys
Re: Bumping MAXCPU on amd64?
On Wednesday, September 22, 2010 6:36:56 am Maxim Sobolev wrote: Hi, Is there any reason to keep MAXCPU at 16 in the default kernel config? There are quite few servers on the market today that have 24 or even 32 physical cores. With hyper-threading this can even go as high as 48 or 64 virtual cpus. People who buy such hardware might get very disappointed finding out that the FreeBSD is not going to use such hardware to its full potential. Does anybody object if I'd bump MAXCPU to 32, which is still low but might me more reasonable default these days, or at least make it an kernel configuration option documented in the NOTES? ? % grep MAXCPU ~/work/freebsd/svn/head/sys/amd64/include/param.h #define MAXCPU 32 #define MAXCPU 1 In fact: % grep MAXCPU ~/work/freebsd/svn/stable/8/sys/amd64/include/param.h #define MAXCPU 32 #define MAXCPU 1 Unfortunately this can't be MFC'd to 7 as it would destroy the ABI for existing klds. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Bumping MAXCPU on amd64?
On Wednesday, September 22, 2010 1:08:30 pm Curtis Penner wrote: MAXCPU at 32 has been good in the 32bit days. Soon there will be (if not already) systems that will have 16cores/socket or more, and motherboards that have 4 sockets or more. Combining this with hyper-threading, you have gone significantly beyond the limits of feasible server. My point was in response to Maxim's mail about bumping it from 16. Going higher than 32 is a bigger project (but in progress-ish) as it involves transitioning away from a simple int to hold CPU ID bitmasks (cpumask_t) and using cpuset_t instead. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: PATCH: fix bogus error message bus_dmamem_alloc failed to align memory properly
On Friday, September 24, 2010 9:00:44 pm Neel Natu wrote: Hi, This patch fixes the bogus error message from bus_dmamem_alloc() about the buffer not being aligned properly. The problem is that the check is against a virtual address as opposed to the physical address. contigmalloc() makes guarantees about the alignment of physical addresses but not the virtual address mapping it. Any objections if I commit this patch? Hmmm, I guess you are doing super-page alignment rather than sub-page alignment? In general I thought the busdma code only handled sub-page alignment and doesn't fully handle requests for super-page alignment. For example, since it insists on walking individual pages at a time, if you had an alignment setting of 4 pages and passed in a single, aligned 4-page buffer, bus_dma would actually bounce the last 3 pages so that each individual page is 4-page aligned. At least, I think that is what would happen. For sub-page alignment, the virtual and physical address alignments should be the same. best Neel Index: sys/powerpc/powerpc/busdma_machdep.c === --- sys/powerpc/powerpc/busdma_machdep.c (revision 213113) +++ sys/powerpc/powerpc/busdma_machdep.c (working copy) @@ -529,7 +529,7 @@ CTR4(KTR_BUSDMA, %s: tag %p tag flags 0x%x error %d, __func__, dmat, dmat-flags, ENOMEM); return (ENOMEM); - } else if ((uintptr_t)*vaddr (dmat-alignment - 1)) { + } else if (vtophys(*vaddr) (dmat-alignment - 1)) { printf(bus_dmamem_alloc failed to align memory properly.\n); } #ifdef NOTYET Index: sys/sparc64/sparc64/bus_machdep.c === --- sys/sparc64/sparc64/bus_machdep.c (revision 213113) +++ sys/sparc64/sparc64/bus_machdep.c (working copy) @@ -652,7 +652,7 @@ } if (*vaddr == NULL) return (ENOMEM); - if ((uintptr_t)*vaddr % dmat-dt_alignment) + if (vtophys(*vaddr) % dmat-dt_alignment) printf(%s: failed to align memory properly.\n, __func__); return (0); } Index: sys/ia64/ia64/busdma_machdep.c === --- sys/ia64/ia64/busdma_machdep.c(revision 213113) +++ sys/ia64/ia64/busdma_machdep.c(working copy) @@ -455,7 +455,7 @@ } if (*vaddr == NULL) return (ENOMEM); - else if ((uintptr_t)*vaddr (dmat-alignment - 1)) + else if (vtophys(*vaddr) (dmat-alignment - 1)) printf(bus_dmamem_alloc failed to align memory properly.\n); return (0); } Index: sys/i386/i386/busdma_machdep.c === --- sys/i386/i386/busdma_machdep.c(revision 213113) +++ sys/i386/i386/busdma_machdep.c(working copy) @@ -540,7 +540,7 @@ CTR4(KTR_BUSDMA, %s: tag %p tag flags 0x%x error %d, __func__, dmat, dmat-flags, ENOMEM); return (ENOMEM); - } else if ((uintptr_t)*vaddr (dmat-alignment - 1)) { + } else if (vtophys(*vaddr) (dmat-alignment - 1)) { printf(bus_dmamem_alloc failed to align memory properly.\n); } if (flags BUS_DMA_NOCACHE) Index: sys/amd64/amd64/busdma_machdep.c === --- sys/amd64/amd64/busdma_machdep.c (revision 213113) +++ sys/amd64/amd64/busdma_machdep.c (working copy) @@ -526,7 +526,7 @@ CTR4(KTR_BUSDMA, %s: tag %p tag flags 0x%x error %d, __func__, dmat, dmat-flags, ENOMEM); return (ENOMEM); - } else if ((uintptr_t)*vaddr (dmat-alignment - 1)) { + } else if (vtophys(*vaddr) (dmat-alignment - 1)) { printf(bus_dmamem_alloc failed to align memory properly.\n); } if (flags BUS_DMA_NOCACHE) ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: PATCH: fix bogus error message bus_dmamem_alloc failed to align memory properly
On Monday, September 27, 2010 5:13:03 pm Neel Natu wrote: Hi John, Thanks for reviewing this. On Mon, Sep 27, 2010 at 8:04 AM, John Baldwin j...@freebsd.org wrote: On Friday, September 24, 2010 9:00:44 pm Neel Natu wrote: Hi, This patch fixes the bogus error message from bus_dmamem_alloc() about the buffer not being aligned properly. The problem is that the check is against a virtual address as opposed to the physical address. contigmalloc() makes guarantees about the alignment of physical addresses but not the virtual address mapping it. Any objections if I commit this patch? Hmmm, I guess you are doing super-page alignment rather than sub-page alignment? In general I thought the busdma code only handled sub-page alignment and doesn't fully handle requests for super-page alignment. Yes, this is for allocations with sizes greater than PAGE_SIZE and alignment requirements also greater than a PAGE_SIZE. For example, since it insists on walking individual pages at a time, if you had an alignment setting of 4 pages and passed in a single, aligned 4-page buffer, bus_dma would actually bounce the last 3 pages so that each individual page is 4-page aligned. At least, I think that is what would happen. I think you are referring to bus_dmamap_load() operation that would follow the bus_dmamem_alloc(), right? The memory allocated by bus_dmamem_alloc() does not need to be bounced. In fact, the dmamap pointer returned by bus_dmamem_alloc() is NULL. At least for the amd64 implementation there is code in _bus_dmamap_load_buffer() which will coalesce individual dma segments if they satisfy 'boundary' and 'segsize' constraints. So the problem is earlier in the routine where it does this: /* * Get the physical address for this segment. */ if (pmap) curaddr = pmap_extract(pmap, vaddr); else curaddr = pmap_kextract(vaddr); /* * Compute the segment size, and adjust counts. */ max_sgsize = MIN(buflen, dmat-maxsegsz); sgsize = PAGE_SIZE - ((vm_offset_t)curaddr PAGE_MASK); if (map-pagesneeded != 0 run_filter(dmat, curaddr)) { sgsize = roundup2(sgsize, dmat-alignment); sgsize = MIN(sgsize, max_sgsize); curaddr = add_bounce_page(dmat, map, vaddr, sgsize); } else { sgsize = MIN(sgsize, max_sgsize); } If you have a map that does need bouncing, then it will split up the pages. It happens to work for bus_dmamem_alloc() because that returns a NULL map which doesn't bounce. But if you had a PCI device which supported only 32-bit addresses on a 64-bit machine with an aligned, 4 page buffer above 4GB and did a bus_dma_map_load() on that buffer, it would get split up into 4 separate 4 page-aligned pages. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: PATCH: fix bogus error message bus_dmamem_alloc failed to align memory properly
On Tuesday, September 28, 2010 4:02:08 pm Neel Natu wrote: Hi John, On Tue, Sep 28, 2010 at 6:36 AM, John Baldwin j...@freebsd.org wrote: On Monday, September 27, 2010 5:13:03 pm Neel Natu wrote: Hi John, Thanks for reviewing this. On Mon, Sep 27, 2010 at 8:04 AM, John Baldwin j...@freebsd.org wrote: On Friday, September 24, 2010 9:00:44 pm Neel Natu wrote: Hi, This patch fixes the bogus error message from bus_dmamem_alloc() about the buffer not being aligned properly. The problem is that the check is against a virtual address as opposed to the physical address. contigmalloc() makes guarantees about the alignment of physical addresses but not the virtual address mapping it. Any objections if I commit this patch? Hmmm, I guess you are doing super-page alignment rather than sub-page alignment? In general I thought the busdma code only handled sub-page alignment and doesn't fully handle requests for super-page alignment. Yes, this is for allocations with sizes greater than PAGE_SIZE and alignment requirements also greater than a PAGE_SIZE. For example, since it insists on walking individual pages at a time, if you had an alignment setting of 4 pages and passed in a single, aligned 4-page buffer, bus_dma would actually bounce the last 3 pages so that each individual page is 4-page aligned. At least, I think that is what would happen. I think you are referring to bus_dmamap_load() operation that would follow the bus_dmamem_alloc(), right? The memory allocated by bus_dmamem_alloc() does not need to be bounced. In fact, the dmamap pointer returned by bus_dmamem_alloc() is NULL. At least for the amd64 implementation there is code in _bus_dmamap_load_buffer() which will coalesce individual dma segments if they satisfy 'boundary' and 'segsize' constraints. So the problem is earlier in the routine where it does this: /* * Get the physical address for this segment. */ if (pmap) curaddr = pmap_extract(pmap, vaddr); else curaddr = pmap_kextract(vaddr); /* * Compute the segment size, and adjust counts. */ max_sgsize = MIN(buflen, dmat-maxsegsz); sgsize = PAGE_SIZE - ((vm_offset_t)curaddr PAGE_MASK); if (map-pagesneeded != 0 run_filter(dmat, curaddr)) { sgsize = roundup2(sgsize, dmat-alignment); sgsize = MIN(sgsize, max_sgsize); curaddr = add_bounce_page(dmat, map, vaddr, sgsize); } else { sgsize = MIN(sgsize, max_sgsize); } If you have a map that does need bouncing, then it will split up the pages. It happens to work for bus_dmamem_alloc() because that returns a NULL map which doesn't bounce. But if you had a PCI device which supported only 32-bit addresses on a 64-bit machine with an aligned, 4 page buffer above 4GB and did a bus_dma_map_load() on that buffer, it would get split up into 4 separate 4 page-aligned pages. You are right. I assume that you are ok with the patch and the discussion above was an FYI, right? I think the patch is ok, but my point is that super-page alignment isn't really part of the design of the current bus_dma and only works for bus_dmammem_alloc() by accident. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Fix mfiutil compile with -DDEBUG
On Sunday, October 03, 2010 10:33:17 pm Garrett Cooper wrote: make -DDEBUG is broken in mfiutil: $ make -DDEBUG cc -O2 -pipe -fno-strict-aliasing -pipe -O2 -march=nocona -fno-builtin-strftime -DDEBUG -Wall -Wcast-align -Woverflow -Wsign-compare -Wunused -std=gnu99 -fstack-protector -Wsystem-headers -Werror -Wall -Wno-format-y2k -W -Wno-unused-parameter -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wno-uninitialized -Wno-pointer-sign -c /usr/src/usr.sbin/mfiutil/mfi_config.c /usr/src/usr.sbin/mfiutil/mfi_config.c: In function 'dump_config': /usr/src/usr.sbin/mfiutil/mfi_config.c:1027: error: 'union mfi_pd_ref' has no member named 'device_id' /usr/src/usr.sbin/mfiutil/mfi_config.c:1083: error: 'union mfi_pd_ref' has no member named 'device_id' *** Error code 1 Stop in /usr/src/usr.sbin/mfiutil. $ device_id is a field in the v field in the mfi_pd_ref union (/sys/dev/mfi/mfireg.h): union mfi_pd_ref { struct { uint16_tdevice_id; uint16_tseq_num; } v; uint32_tref; } __packed; Yes, there were different versions of these definitions in mfireg.h at one point. Your patch is fine. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: panic_cpu should be volatile
On Thursday, October 07, 2010 1:40:49 pm Andriy Gapon wrote: panic_cpu variable in kern_shutdown.c should be volatile otherwise it's cached in a register in the innermost while-loop in this code (observed on amd64 with base gcc and -O2): if (panic_cpu != PCPU_GET(cpuid)) while (atomic_cmpset_int(panic_cpu, NOCPU, PCPU_GET(cpuid)) == 0) while (panic_cpu != NOCPU) ; /* nothing */ The patch is here: http://people.freebsd.org/~avg/panic_cpu.diff I also took a liberty to move the variable into the scope of panic() functions as it doesn't seem to be useful outside of it. But this is not necessary, of course. Looks fine to me. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Make mfiutil(8) more robust
On Sunday, October 03, 2010 11:32:01 pm Garrett Cooper wrote: On Sun, Oct 3, 2010 at 8:30 PM, Garrett Cooper yaneg...@gmail.com wrote: As discussed offlist with some of the Yahoo! FreeBSD folks, mfiutil catches errors, but doesn't communicate it back up to the executing process. Examples follow... Before: I think these are both fine. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: generic_stop_cpus: prevent parallel execution
On Thursday, October 07, 2010 1:53:46 pm Andriy Gapon wrote: Here is patch that applies the technique from panic() to generic_stop_cpus() to prevent its parallel execution on multiple CPUs: http://people.freebsd.org/~avg/generic_stop_cpus.diff In theory this could lead to two CPUs stopping each other and everyone else, and thus a total system halt. Also, in theory, we should have some smarter locking here, because two (or more CPUs) could be stopping unrelated sets of CPUs. But in practice, it seems, this function is only used to stop all other CPUs. Unless I overlooked other usages, that is. Additionally, I took this opportunity to make amd64-specific suspend_cpus() function use generic_stop_cpus() instead of rolling out essentially duplicate code. I couldn't see any reason no to consolidate, but perhaps I missed something. Big thanks to Matthew and his employer for the idea and example. One note. Use 'cpu_spinwait()' in the inner loop waiting for 'stopping_cpu' to change. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: anyone got advice on sendmail and TLS on 8.1?
On Sunday, October 10, 2010 5:22:01 pm Julian Elischer wrote: When I last did sendmail there wasn't any TLS/SSL stuff. has anyone got an exact howto as to how to enable a simple sendmail server? all I want is: TLS and authenticated email submission by me and my family able to forward the email anywhere (maybe just to my ISP but who knows) (outgoing) non TLS submission from outside to reject all mail not to elischer.{org,com} and deliver our mail to mailboxes or gmail (or where-ever /etc/aliases says.). This is probably ALMOST a default configuration but I can't be sure what is needed.. there are several howtos on hte net but they are generally old and differ in details. Your best bet is probably to look at the docs on sendmail.org. You need to recompile the sendmail in base against SASL and need to install cyrus-sasl2 from ports to manage your authentication database. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: [PATCH] Fix /bin/sh compilation with CFLAGS += -DDEBUG 1
On Tuesday, October 12, 2010 6:47:49 am Garrett Cooper wrote: Hi, It looks like the format strings are broken on 64-bit archs in /bin/sh's TRACE functionality (can be enabled by uncommenting -DDEBUG 1 in bin/sh/Makefile). The attached patch fixes this functionality again so one can trace sh's calls with TRACE, which may or may be helpful to those debugging /bin/sh. Tested build and execution on amd64; tested build on i386. Thanks! -Garrett I don't think the Makefile bits are needed, you can just use 'make DEBUG_FLAGS=-g -DDEBUG=2' instead. Also, if you plan on using -g you should generally set DEBUG_FLAGS anyway so binaries are not stripped. The use of things like PRIoMAX is not done in FreeBSD as it is ugly. You can use things like '%t' to print ptrdiff_t types instead. So for example, for the first hunk, I would change the type of 'startloc' to ptrdiff_t and use this: TRACE((evalbackq: size=%td: \%.*s\\n, (dest - stackblock()) - startloc, (int)((dest - stackblock()) - startloc), stackblock() + startloc)); Also, in your change here, you used %j to print a size_t. That will break on i386. You should use %z to print size_t's, but even better is to just use %t to print a ptrdiff_t (which is the type that holds the difference of two pointers). The various changes in jobs.c should use '%td' as well rather than (int) casts. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: [PATCH] Fix /bin/sh compilation with CFLAGS += -DDEBUG 1
On Tuesday, October 12, 2010 2:31:36 pm Garrett Cooper wrote: On Tue, Oct 12, 2010 at 5:30 AM, John Baldwin j...@freebsd.org wrote: You should use %z to print size_t's, but even better is to just use %t to print a ptrdiff_t (which is the type that holds the difference of two pointers). Ok. The overall temperature of using PRI* from POSIX seems like it's undesirable; is it just POSIX cruft that FreeBSD conforms to in theory only and doesn't really use in practice, or is there an example of real practical application where it's used in the sourcebase? PRI* are ugly. FreeBSD provides it so that we are compliant and so that portable code can use it, but we do not use it in our source tree because it is unreadable. The various changes in jobs.c should use '%td' as well rather than (int) casts. Ok. Tested build and runtime on amd64 and tested build-only with i386. Hmm, jobs.c shouldn't need any of the (ptrdiff_t) casts as the expression being printed is already a ptrdiff_t. See this non-debug code in jobs.c for example: int bgcmd(int argc, char **argv) { char s[64]; struct job *jp; ... do { ... fmtstr(s, 64, [%td] , jp - jobtab + 1); -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: [PATCH] Bug with powerof2 macro in sys/param.h
On Thursday, October 14, 2010 7:58:32 am Andriy Gapon wrote: on 14/10/2010 00:30 Garrett Cooper said the following: I was talking to someone today about this macro, and he noted that the algorithm is incorrect -- it fails the base case with ((x) == 0 -- which makes sense because 2^(x) cannot equal 0 (mathematically impossible, unless you consider the limit as x goes to negative infinity as log (0) / log(2) is undefined). I tested out his claim and he was right: That's kind of obvious given the code. I think that this might be an intentional optimization. I guess that it doesn't really make sense to apply powerof2 to zero and the users of the macro should do the check on their own if they expect zero as input (many places in the do not allow that). I agree, the current macro is this way on purpose (and straight out of Hacker's Delight). Of the existing calls you weren't sure of: sys/dev/cxgb/cxgb_sge.c: while (!powerof2(fl_q_size)) sys/dev/cxgb/cxgb_sge.c: while (!powerof2(jumbo_q_size)) These are fine, will not be zero. sys/x86/x86/local_apic.c: KASSERT(powerof2(count), (bad count)); sys/x86/x86/local_apic.c: KASSERT(powerof2(align), (bad align)); These are fine. No code allocates zero IDT vectors. We never allocate IDT vectors for unallocated MSI or MSI-X IRQs. sys/net/flowtable.c: ft-ft_lock_count = 2*(powerof2(mp_maxid + 1) ? (mp_maxid + 1): Clearly, 'mp_maxid + 1' will not be zero (barring a bizarre overflow case which will not happen until we support 2^32 CPUs), so this is fine. sys/i386/pci/pci_pir.c:if (error !powerof2(pci_link-pl_irqmask)) { This fine. Earlier in the function if pl_irqmask is zero, then all of the pci_pir_choose_irq() calls will fail, so this is only invoked if pl_irqmask is non-zero. In practice pl_irqmask is never zero anyway. I suspect the GEOM ones are also generally safe. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: [PATCH] Bug with powerof2 macro in sys/param.h
On Thursday, October 14, 2010 11:49:23 pm Garrett Cooper wrote: On Thu, Oct 14, 2010 at 6:37 AM, John Baldwin j...@freebsd.org wrote: On Thursday, October 14, 2010 7:58:32 am Andriy Gapon wrote: on 14/10/2010 00:30 Garrett Cooper said the following: I was talking to someone today about this macro, and he noted that the algorithm is incorrect -- it fails the base case with ((x) == 0 -- which makes sense because 2^(x) cannot equal 0 (mathematically impossible, unless you consider the limit as x goes to negative infinity as log (0) / log(2) is undefined). I tested out his claim and he was right: That's kind of obvious given the code. I think that this might be an intentional optimization. I guess that it doesn't really make sense to apply powerof2 to zero and the users of the macro should do the check on their own if they expect zero as input (many places in the do not allow that). But the point is that this could be micro-optimizing things incorrectly. I'm running simple iteration tests to see what the performance is like, but the runtime is going to take a while to produce stable results. Mathematically there is a conflict with the definition of the macro, so it might confuse folks who pay attention to the math as opposed to the details (if you want I'll gladly add a comment around the macro in a patch to note the caveats of using powerof2). We aren't dealing with mathematicians, but programmers. I agree, the current macro is this way on purpose (and straight out of Hacker's Delight). And this book trumps you on that case. Using the powerof2() macro as it currently stands is a widely-used practice among folks who write systems-level code. If you were writing a powerof2() function for a higher level language where performance doesn't matter and bit twiddling isn't common, then a super-safe variant of powerof2() might be appropriate. However, this is C, and C programmers are expected to know how this stuff works. sys/net/flowtable.c: ft-ft_lock_count = 2*(powerof2(mp_maxid + 1) ? (mp_maxid + 1): Clearly, 'mp_maxid + 1' will not be zero (barring a bizarre overflow case which will not happen until we support 2^32 CPUs), so this is fine. But that should be caught by the mp_machdep code, correct? Yes, hence bizarre. It is also way unrealistic and not worth excessive pessimizations scattered throughout the tree. What about the other patches? The mfiutil and mptutil ones at least get the two beforementioned tools in sync with sys/param.h at least, so I see some degree of value in the patches (even if they're just cleanup). No, powerof2() should not change. It would most likely be a POLA violation to change how it works given 1) it's historical behavior, and 2) it's underlying idiom's common (and well-understood) use among the software world. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: SCSI_DELAY cleanup
On Tuesday, October 19, 2010 10:31:10 am Alexander Best wrote: On Tue Oct 19 10, Matthew Jacob wrote: It would be an effective behavioral change for those of us who remove that line. Personally, I think 5 seconds is too long- even 2 seconds is more than adequate even for moderately old 'other' hardware like scanners. For -current, why don't you simply remove all of the config lines and leave the default at 2000ms? hmmm...i can only test the delay value on amd64. i was under the impression that archs like arm and mips need the longer delay. also at some locations in the code SCSI_DELAY is being set to 15000. i believe this is the case when certain drivers (cam, ahb, aha) get loaded as a kernel module, but i'm not sure. it looks like this: .if !defined(KERNBUILDDIR) opt_scsi.h: echo #define SCSI_DELAY 15000 ${.TARGET} .endif I believe this is all old history. SCSI_DELAY used to be set to 15000 in GENERIC many years ago and was lowered to 5000. Most likely these Makefiles were simply not updated at the time. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: SCSI_DELAY cleanup
On Tuesday, October 19, 2010 3:14:46 pm Alexander Best wrote: On Tue Oct 19 10, John Baldwin wrote: On Tuesday, October 19, 2010 10:31:10 am Alexander Best wrote: On Tue Oct 19 10, Matthew Jacob wrote: It would be an effective behavioral change for those of us who remove that line. Personally, I think 5 seconds is too long- even 2 seconds is more than adequate even for moderately old 'other' hardware like scanners. For -current, why don't you simply remove all of the config lines and leave the default at 2000ms? hmmm...i can only test the delay value on amd64. i was under the impression that archs like arm and mips need the longer delay. also at some locations in the code SCSI_DELAY is being set to 15000. i believe this is the case when certain drivers (cam, ahb, aha) get loaded as a kernel module, but i'm not sure. it looks like this: .if !defined(KERNBUILDDIR) opt_scsi.h: echo #define SCSI_DELAY 15000 ${.TARGET} .endif I believe this is all old history. SCSI_DELAY used to be set to 15000 in GENERIC many years ago and was lowered to 5000. Most likely these Makefiles were simply not updated at the time. oh i see. maybe this revised patch would be better suited then. I think so, but you should post this to scsi@ for the best review. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: fix pnpinfo on arch=amd64
On Saturday, October 23, 2010 8:22:48 pm Alexander Best wrote: this tiny patch will fix pnpinfo so it doesn't core dump (bus error) any longer on arch=amd64. This utility isn't really useful on amd64 though. No amd64 machines have ISA slots in which to place an ISA PnP adapter. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: fix pnpinfo on arch=amd64
On Monday, October 25, 2010 9:34:37 am Erik Trulsson wrote: On Mon, Oct 25, 2010 at 08:45:47AM -0400, John Baldwin wrote: On Saturday, October 23, 2010 8:22:48 pm Alexander Best wrote: this tiny patch will fix pnpinfo so it doesn't core dump (bus error) any longer on arch=amd64. This utility isn't really useful on amd64 though. No amd64 machines have ISA slots in which to place an ISA PnP adapter. Are you really sure about that? See http://www.ibase.com.tw/2009/mb945.htmL or http://www.adek.com/ATX-motherboards.html for what certainly looks like counter-examples. Hmm, well, I suspect in this case these boards exist to support really ancient custom hardware. If you are stuck with one of these, then manually needing to fix up pnpinfo.c is probably the least of your problems. However, I strongly doubt that FreeBSD users are lining up to buy these motherboards so they can use an ISA SB16 adapter with FreeBSD/amd64. I was not aware of these boards previously, but I still doubt that pnpinfo is relevant to any FreeBSD/amd64 users. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: SYSCALL_MODULE() macro and modfind() issues
On Tuesday, October 26, 2010 3:28:10 am Andriy Gapon wrote: on 26/10/2010 01:01 Selphie Keller said the following: hi fbsd-hackers, Noticed a issue in 8.1-release, 8.1p1-release and 8.1-stable amd64/i386, to where modfind() will no longer find pmap_helper for the /usr/ports/sysutils/pmap port, or other syscall modules using SYSCALL_MODULE() macro. The issue is that modfind() function no longer finds any modules using SYSCALL_MODULE() macro to register the kernel module. Making it difficult for userland apps to call the syscall provided. modfind() always returns -1 which prevents modstat() from getting the required information to perform the syscall. Also tested, the demo syscall module: After commit r205320 and, apparently, its MFC you need to prefix the module with sys/. For example: modstat(modfind(sys/syscall), stat); P.S. Perhaps a KPI breakage in a stable branch? Ugh, it was a breakage though it's too late to back it out at this point. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: SYSCALL_MODULE() macro and modfind() issues
On Tuesday, October 26, 2010 4:00:14 am Selphie Keller wrote: Thanks Andriy, Took a look at the change to src/sys/sys/sysent.h @@ -149,7 +149,7 @@ static struct syscall_module_data name## }; \ \ static moduledata_t name##_mod = { \ - #name, \ + sys/ #name, \ syscall_module_handler, \ name##_syscall_mod \ }; \ applied the MFC prefix to pmap port: --- /usr/ports/sysutils/pmap/work/pmap/pmap/pmap.c.orig 2010-10-26 00:55:32.0 -0700 +++ /usr/ports/sysutils/pmap/work/pmap/pmap/pmap.c 2010-10-26 00:56:10.0 -0700 @@ -86,12 +86,12 @@ main(int argc, char **argv) struct kinfo_proc *kp; intpmap_helper_syscall; -if ((modid = modfind(pmap_helper)) == -1) { +if ((modid = modfind(sys/pmap_helper)) == -1) { /* module not found, try to load */ modid = kldload(pmap_helper.ko); if (modid == -1) err(1, unable to load pmap_helper module); - modid = modfind(pmap_helper); + modid = modfind(sys/pmap_helper); if (modid == -1) err(1, pmap_helper module loaded but not found); } which restored functionality on freebsd 8.1. The best approach might be to have something like this: static int pmap_find(void) { int modid; modid = modfind(pmap_helper); if (modid == -1) modid = modfind(sys/pmap_helper); return (modid); } then in the original main() routine use this: if ((modid = pmap_find()) == -1) { /* module not found, try to load */ modid = kldload(pmap_helper.ko); if (modid == -1) err(1, unable to load pmap_helper module); modid = pmap_find(); if (modid == -1) err(1, pmap_helper module loaded but not found); } This would make the code work for both old and new versions. -Estella Mystagic (Selphie) On Tue, Oct 26, 2010 at 12:28 AM, Andriy Gapon a...@icyb.net.ua wrote: on 26/10/2010 01:01 Selphie Keller said the following: hi fbsd-hackers, Noticed a issue in 8.1-release, 8.1p1-release and 8.1-stable amd64/i386, to where modfind() will no longer find pmap_helper for the /usr/ports/sysutils/pmap port, or other syscall modules using SYSCALL_MODULE() macro. The issue is that modfind() function no longer finds any modules using SYSCALL_MODULE() macro to register the kernel module. Making it difficult for userland apps to call the syscall provided. modfind() always returns -1 which prevents modstat() from getting the required information to perform the syscall. Also tested, the demo syscall module: After commit r205320 and, apparently, its MFC you need to prefix the module with sys/. For example: modstat(modfind(sys/syscall), stat); P.S. Perhaps a KPI breakage in a stable branch? -- Andriy Gapon ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: stock gdb bug: DWARF2 with DWARF_OFFSET_SIZE == 8
On Monday, October 25, 2010 7:39:17 pm Oleksandr Tymoshenko wrote: gdb on MIP64 does not read DWARF2 line information correctly if gcc was configured with DWARF_OFFSET_SIZE == 8. .debug_line starts with total length field which could be 12 bytes long or 4 bytes long. If it starts with 0x - it's 12 bytes long. Depending on its size one of the following field is either 8 bytes or 4 bytes. This one-line patch fixes this issue for MIPS64 but I'm not 100% sure that it doesn't break something else. So I'd appreciate input of someone with better grip on ELF/DWARF stuff then me. Patch: http://people.freebsd.org/~gonzo/patches/mips64gdb.diff I looked at GDB 6.6's source and it does pass in cu-header instead of NULL at the same place, so I think your fix is correct. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: [PATCH] mfiutil(8) - capture errors and percolate up to caller
On Tuesday, October 26, 2010 2:09:53 pm Garrett Cooper wrote: Because a number of places in the mfiutil(8) code immediately call warn(3) after an error to an API occurred, and because warn(3) employs printf, et all (multiple times) in libc, there's an off-chance that the errno value can get stomped on by the warn(3) calls, which could lead to confusing results from anyone depending on the value being returned from the mfiutil APIs. Thus, the attached patch I'm providing fixes those cases, as well as converts an existing internal API (display_pending_firmware) to an non-void return mechanism. I also made a few stack variable alignment changes to match style(9) as well as got rid of the ad hoc powerof2 call in favor of the value in sys/param.h. I've run a small number of unit tests on my desktop at home with my mfi(4) card, but will test out other failing cases with equipment I have access to at work. Just a few nits: 1) The include of sys/param.h should replace sys/types.h (there's a note about these two headers in style(9), FYI). 2) patrol_get_props() should return 'error' on failure rather than 'errno'. 3) mfi_get_time() failing isn't fatal. The code already handles this case by not printing out a 'next run time' if at is zero. I think you can remove the check for at == 0. If all the other commands work and just that command fails I don't think it should be fatal. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: [PATCH] Fix 'implicit declaration' warning and update vgone(9)
On Wednesday, October 27, 2010 7:33:13 am Sergey Kandaurov wrote: On 27 October 2010 10:23, Lars Hartmann l...@chaotika.org wrote: The vgonel function isnt declarated in any header, the vgonel prototype in vgone(9) isnt correct - found by Ben Kaduk ka...@mit.edu Hi. I'm afraid it's just an overlooked man page after many VFS changes in 5.x. As vgonel() is a static (i.e. private and not visible from outside) function IMO it should be removed from vgone(9) man page. Agreed. It certainly should not be added to vnode.h. I'm curious how the reporter is getting a warning since there is a static prototype for vgonel() in vfs_subr.c. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: [PATCH] hwpmc(4) syscall arguments fix
On Friday, October 29, 2010 8:12:06 pm Oleksandr Tymoshenko wrote: I ran into problems trying to get hwpmc to work on 64-bit MIPS system with big endian byte order. Turned out hwpmc syscall handler is byte-order and register_t size agnostic unlike the rest of syscalls. The best solution I have so far is a copy sys/sysproto.h approach: http://people.freebsd.org/~gonzo/patches/hwpmc-syscall.diff Any other ideas how to get it fixed in more clean way? Yes, a better way would be to add pmc_syscall() to sys/kern/syscalls.master as a NOSTD system call. Then it's arguments would be included in sysproto.h directly. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Fileops in file.h
On Sunday, November 07, 2010 10:08:08 am Fernando Apesteguía wrote: Hi, I'm trying to understand some pieces of the FreeBSD kernel. Having a look at struct fileops in file.h I was wondering why other file related functions don't have an entry in the function vector. I was thinking in mmap, fsync or sendfile. Can anyone tell me the reason? Mostly that it hasn't been done yet. If there was a clean way to do an f_mmap() and get some of the type-specific knowledge out of vm_mmap.c I'd really like it. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: [PATCH] mptutil(8) - capture errors and percolate up to caller
On Saturday, November 06, 2010 4:13:23 am Garrett Cooper wrote: Similar to r214396, this patch deals with properly capturing error and passing it up to the caller in mptutil just in case the errno value gets stomped on by warn*(3); this patch deals with an improper use of warn(3), and also some malloc(3) errors, as well as shrink down some static buffers to fit the data being output. If someone could review and help me commit this patch it would be much appreciated; all I could do is run negative tests on my local box and minor positive tests on my vmware fusion instance because it doesn't fully emulate a fully working mpt(4) device (the vmware instance consistently crashed with a warning about the mpt controller's unimplemented features after I poked at it enough). I'll submit another patch to fix up style(9) in this app if requested. Thanks! The explicit 'return (ENOMEM)' calls are fine as-is. I do not think they need changing. Having static char arrays of '15' rather than '16' is probably pointless. The stack is already at least 4-byte aligned on all the architectures we support, so a 15-byte char array will actually be 16 bytes. It was chose to be a good enough value, not an exact fit. An exact fit is not important here. Moving the 'buf' in mpt_raid_level() is a style bug. It should stay where it is. Same with 'buf' in mpt_volstate() and mpt_pdstate(). IOC_STATUS_SUCCESS() returns a boolean, it is appropriate to test it with ! rather than == 0. It is also easier for a person to read the code that way. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: libkvm: consumers of kvm_getprocs for non-live kernels?
On Wednesday, November 10, 2010 3:41:52 pm Ulrich Spörlein wrote: Hi, I have this cleanup of libkvm sitting in my tree and it needs a little bit of testing, especially the function kvm_proclist, which is only called from kvm_deadprocs which is only called from kvm_getprocs when kd is not ALIVE. The only consumer in our tree that I can make out is *probably* kgdb, as ps(1), top(1), w(1), pkill(1), fstat(1), systat(1), pmcstat(8) and bsnmpd don't really work on coredumps ps and fstat certainly work fine on crashdumps. w did before devfs (it doesn't have a good way to map the device entries from the crashed kernel to the entries in wtmp IIRC). kvm_getprocs() is certainly actively used by various programs on crashdumps and works. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Managing userland data pointers in kqueue/kevent
On Friday, November 12, 2010 1:40:00 pm Paul LeoNerd Evans wrote: I'm trying to build a high-level language wrapper around kqueue/kevent, specifically, a Perl wrapper. (In fact I am trying to fix this bug: http://rt.cpan.org/Public/Bug/Display.html?id=61481 ) My plan is to use the void *udata field of a kevent watcher to store a pointer to some user-provided Perl data structure (an SV*), to associate with the event. Typically this could be a code reference for an event callback or similar, but the exact nature doesn't matter. It's a pointer to a reference-counted data structure. SvREFCNT_dec(sv) is the function used to decrement the reference counter. To account for the fact that the kernel stores a pointer here, I'm artificially increasing the reference count on the object, so that it still remains alive even if the rest of the Perl code drops it, to rely on getting it back out of the kernel in an individual kevent. At some point when the kernel has finished looking after the event, this count needs to be decreased again, so the structure can be freed. I am having trouble trying to work out how to do this, or rather, when. I have the following problems: * If the event was registered using EV_ONESHOT, when it gets fired the flags that come back in the event stucture do not include EV_ONESHOT. * Some events can only happen once, such as watching for EVFILT_PROC NOTE_EXIT events. * The kernel can silently drop watches, such as when the process calls close() on a filehandl with an EVFILT_READ or EVFILT_WRITE watch. * There doesn't seem to be a way to query that pointer back out of the kernel, in case the user code wants to EV_DELETE the watch. These problems all mean that I never quite know when I ought to call SvREFCNT_dec() on that pointer. My current best-attack plan looks like the following: a) Store a structure in the void *udata that contains the actual SV* pointer and a flag to remember if the event had been installed as EV_ONESHOT (or remember if it was one of the event types that is oneshot anyway) b) Store an entire mapping in userland from filter+identity to pointer, so that if userland wants to EV_DELETE the watch early, it has the pointer to be able to drop it. I can't think of a solution to the close() problem at all, though. Part a of my solution seems OK (though I'd wonder why the flags back from the kernel don't contain EV_ONESHOT), but part b confuses me. I had thought the point of kqueue/kevent is the O(1) nature of it, which is among why the kernel is storing that void *udata pointer in the first place. If I have to store a mapping from every filter+identity back to my data pointer, why does the kernel store one at all? I could just ignore the udata field and use my mapping for my own purposes. Have I missed something here, then? I was hoping there'd be a nice way for kernel to give me back those pointers so I can just decrement a refcount on it, and have it reclaimed. I think the assumption is that userland actually maintains a reference on the specified object (e.g. a file descriptor) and will know to drop the associated data when the file descriptor is closed. That is, think of the kevent as a member of an eventable object rather than a separate object that has a reference to the eventable object. When the eventable object's reference count drops to zero in userland, then the kevent should be deleted, either via EV_DELETE, or implicitly (e.g. by closing the associated file descriptor). I think in your case you should not give the kevent a reference to your object, but instead remove the associated event for a given object when an object's refcount drops to zero. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Phantom sysctl
On Monday, November 15, 2010 12:53:57 pm Garrett Cooper wrote: According to SYSCTL_INT(9): The SYSCTL kernel interfaces allow code to statically declare sysctl(8) MIB entries, which will be initialized when the kernel module containing the declaration is initialized. When the module is unloaded, the sysctl will be automatically destroyed. The sysctl should be reaped when the module is unloaded. My dumb test kernel module [1] doesn't seem to do that though (please note that the OID test_int_sysctl is created, and not reaped... FWIW it's kind of bizarre that test_int_sysctl is created in the first place, given what I've seen when SYSCTL_* gets executed): I believe I have seen this work properly before. Look for 'sysctl' in sys/kern/kern_linker.c to see the sysctl hooks invoked on kldload and kldunload to manage these sysctls. You will probably want to start your debugging in the unload hook as it sounds like the node is not being fully deregistered. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Managing userland data pointers in kqueue/kevent
On Monday, November 15, 2010 1:12:11 pm Paul LeoNerd Evans wrote: On Mon, Nov 15, 2010 at 11:25:42AM -0500, John Baldwin wrote: I think the assumption is that userland actually maintains a reference on the specified object (e.g. a file descriptor) and will know to drop the associated data when the file descriptor is closed. That is, think of the kevent as a member of an eventable object rather than a separate object that has a reference to the eventable object. When the eventable object's reference count drops to zero in userland, then the kevent should be deleted, either via EV_DELETE, or implicitly (e.g. by closing the associated file descriptor). Ah. Well, that could be considered a bit more awkward for the use case I wanted to apply. The idea was that the udata would refer effectively to a closure, to invoke when the event happens. The idea being you can just add an event watcher by, say: $ev-EV_SET( $pid, EVFILT_PROC, 0, NOTE_EXIT, 0, sub { print STDERR The child process $pid has now exited\n; } ); So, the kernel's udata pointer effectively holds the only reference to this anonymous closure. It's much more flexible this way, especially for oneshot events like that. The beauty is also that the kevents() loop can simply know that the udata is always a code reference so just has to invoke it to do whatever the original caller wanted to do. Keep in mind my use-case here; I'm not trying to be one specific application, it's a general-purpose kevent-wrapping library. So is GCD (Apple's libdispatch). It also implements closures on top of kevent. However, the way it works is that it doesn't expose kevent() directly, instead it uses kevent to implement asynchronous I/O on a socket for example, and since it is logically managing the life cycle of a socket, it knows when the socket is closed and cleans up then. I think in your case you should not give the kevent a reference to your object, but instead remove the associated event for a given object when an object's refcount drops to zero. Well that's certainly doable in longrunning watches, but I don't think it sounds very convenient for a oneshot event; see the above example for justification. For the above case, if you know an event is one shot, you should either use EV_ONESHOT, or use a wrapper around the closure that clears the event after the closure runs (or possibly before the closure runs?) Also it again begs my question, worth repeating here: On Friday, November 12, 2010 1:40:00 pm Paul LeoNerd Evans wrote: I had thought the point of kqueue/kevent is the O(1) nature of it, which is among why the kernel is storing that void *udata pointer in the first place. If I have to store a mapping from every filter+identity back to my data pointer, why does the kernel store one at all? I could just ignore the udata field and use my mapping for my own purposes. If you're saying that in my not-so-rare use case, I don't want to be using udata, and instead keeping my own mapping, why does the kernel provide this udata field at all? Your use case is rare. Almost all consumers of kevent() that I've seen use kevent() as one part of a system that maintain the lifecycle of objects. Those objects are only accessed within the system, so the system knows when an object is closed and can release the resources at the same time. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: breaking the crunchgen logic into a share/mk file
On Tuesday, November 16, 2010 8:01:43 am Andrey V. Elsukov wrote: On 08.11.2010 15:31, Adrian Chadd wrote: I've broken out the crunchgen logic from src/rescue/rescue into a share/mk file - that way it can be reused in other areas. The diff is here: http://people.freebsd.org/~adrian/crunchgen-mk.diff http://people.freebsd.org/%7Eadrian/crunchgen-mk.diff This bsd.crunchgen.mk file is generic enough to use in my busybox-style thing as well as for src/rescue/rescue/. Comments, feedback, etc welcome! It seems this broke usage of livefs from sysinstall. sysinstall does check for /rescue/ldconfig and can not find it there. I think attached patch can fix this issue (not tested). Err, are there no longer hard links to all of the frontends for a given crunch? If so, that is a problem as it will make rescue much harder to use. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Software interrupts; where to start
On Tuesday, November 16, 2010 12:08:51 pm Nathan Vidican wrote: What I would like to do, is replace the above scenario with one wherein the program writing to the serial port is always connected and running, but not polling; ideally having some sort of interupt or signal triggered from within memcached when a value is altered. Sort of a 're-sync' request asserting that the program sending data out the serial port should 'loop once'. I'd like to continue with the use of memcached as it provides a simple way for multiple systems to query the values in the array as well, (ie: some devices need not change the data, but only view it; given the latency requirements memcached operates ideally). This trigger should be asynchronous in that it should be fired and forgotten by memcached (by nature of the hardware designed, no error-checking nor receipt would be needed). I'm just not sure where to start? Could someone send me the right RTFM link to start from, or perhaps suggest a better way to look at solving this problem? Ideally any example code to look at with a simple signal or interrupt type of handler would be great. What I'm leaning towards is modifying memcached daemon to send a signal or trigger an interrupt of some sort to tell the other program communicating with the device to re-poll once. It would also be nice to have a way to trigger from other programs too. A simple solution would be to create a pipe shared between memcached and the process that writes to the serial port. memcached would write a dummy byte to the pipe each time it updates the values. Your app could either use select/poll/kqueue or a blocking read on the pipe to sleep until memcached does an update. That requires modify memcached though. I'm not familiar enough with memcached to know if it already has some sort of signalling facility already that you could use directly. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: breaking the crunchgen logic into a share/mk file
On Tuesday, November 16, 2010 8:45:08 am Andrey V. Elsukov wrote: On 16.11.2010 16:29, John Baldwin wrote: Err, are there no longer hard links to all of the frontends for a given crunch? If so, that is a problem as it will make rescue much harder to use. Yes, probably this patch is not needed and it should be fixed somewhere in makefiles. But currently rescue does not have any hardlinks: http://pub.allbsd.org/FreeBSD-snapshots/i386-i386/9.0-HEAD-20101116-JPSNAP/cdrom/livefs/rescue/ And what is was before: http://pub.allbsd.org/FreeBSD-snapshots/i386-i386/9.0-HEAD-20101112-JPSNAP/cdrom/livefs/rescue/ That definitely needs to be fixed. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: new cpuid bits
On Friday, November 19, 2010 10:39:53 am Andriy Gapon wrote: Guys, I would like to add definitions for couple more useful CPUID bits, but I am greatly confused about how to name them. I failed to deduce the naming convention from the existing definitions and I am not sure how to make the names proper and descriptive. The bits in question are returned by CPUID.6 in EAX and ECX. CPUID.6 block is described by both AMD and Intel as Thermal and Power Management (Leaf). Bits in EAX are defined only for Intel at present, the bit in ECX is defined for both. Description/naming of the bits from the specifications: EAX[0]: Digital temperature sensor is supported if set EAX[1]: Intel Turbo Boost Technology Available EAX[2]: ARAT. APIC-Timer-always-running feature is supported if set. ECX[0]: Intel: Hardware Coordination Feedback Capability (Presence of Bits MCNT and ACNT MSRs). AMD: EffFreq: effective frequency interface. How does the following look to you? I will appreciate suggestions/comments. Looks fine to me. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Quick i386 question...
On Saturday, November 20, 2010 3:38:58 pm Sergio Andrés Gómez del Real wrote: If received an interrupt while in protected-mode and paging enabled, is linear address from IDT stored at the idtr translated using the paging-hierarchy structures? I have looked at the interrupt/exception chapter in the corresponding Intel manual but can't find the answer. Maybe I overlooked. Yes. A linear address is the flat virtual address after segments are taken into account. It is the address used as an input to the paging support in the MMU. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Best way to determine if an IRQ is present
On Saturday, November 20, 2010 4:58:02 pm Garrett Cooper wrote: Trying to do a complete solution for kern/145385, Andriy has raised concerns about IRQ mapping to CPUs; while I've have put together more pieces of the puzzle, I'm a bit confused how I determine whether or not an IRQ is available for use. Sure, I could linear probe a series of IRQs, but that would probably be expensive, and different architectures treat IRQs differently, so building assumptions based on the fact that IRQ hierarchy is done in a particular order is probably not the best thing to do. I've poked around kern/kern_cpuset.c and kern/kern_intr.c a bit but I may have missed something important... Well, the real solution is actually larger than described in the PR. What you really want to do is take the logical CPUs offline when they are halted. Taking a CPU offline should trigger an EVENTHANDLER that various bits of code could invoke. In the case of platforms that support binding interrupts to CPUs (x86 and sparc64 at least), they would install an event handler that searches the MD interrupt tables (e.g. the interrupt_sources[] array on x86) and move bound interrupts to other CPUs. However, I think all the interrupt bits will be MD, not MI. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Building my own release ISOs
On Sunday, November 21, 2010 8:31:22 pm Sean Bruno wrote: Does this look about right to build from a test branch? sudo make release SVNROOT=ssh+svn://svn.freebsd.org/base SVNBRANCH=projects/sbruno_64cpus MAKE_ISOS=y MAKE_DVD=y NO_FLOPPIES=y NODOC=y NOPORTSATALL=y WORLD_FLAGS=-j32 KERNEL_FLAGS=-j32 BUILDNAME=sbruno CHROOTDIR=/new_release Sure. Note, though, that you don't have to create a branch just to build a release with a patch. You can always use LOCAL_PATCHES to apply patches to the source tree you build a release against. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Best way to determine if an IRQ is present
Andriy Gapon wrote: on 22/11/2010 16:24 John Baldwin said the following: Well, the real solution is actually larger than described in the PR. What you really want to do is take the logical CPUs offline when they are halted. Taking a CPU offline should trigger an EVENTHANDLER that various bits of code could invoke. In the case of platforms that support binding interrupts to CPUs (x86 and sparc64 at least), they would install an event handler that searches the MD interrupt tables (e.g. the interrupt_sources[] array on x86) and move bound interrupts to other CPUs. However, I think all the interrupt bits will be MD, not MI. That's a good idea and a comprehensive approach. One minor technical detail - should an offlined CPU be removed from all_cpus mask/set? That's tricky. In other e-mails I've had on this topic, the idea has been to have a new online_cpus mask and maybe a CPU_ONLINE() test macro similar to CPU_ABSENT(). In that case, an offline CPU should still be in all_cpus, but many places that use all_cpus would need to use online_cpus instead. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: How to debug BTX loader?
On Monday, November 29, 2010 1:01:27 pm Darmawan Salihun wrote: Hi guys, I'm currently working on a BIOS for a custom Single Board Computer (SBC). I have the required BIOS source code and tools at hand. However, the boot process always stuck in the BTX loader (the infamous ACPI autoload failed) when I booted out of USB stick (with the FreeBSD 8.1 USB stick image). I could get the system to boot into FreeBSD 8.1 (by keeping the CDROM tray open and close it when the board looks for boot device, otherwise BTX will reboot instantly). Are you getting an actual BTX error message or a freeze? BTX is just a minikernel written all in assembly. It doesn't handle loading the kernel, etc. All that work is done by the /boot/loader program (which is written in C). You can find all the source to the boot code in src/sys/boot. The BTX kernel is in src/sys/boot/i386/btx/btx/. However, to debug this further we would need more info such as what exactly you are seeing (a hang, a BTX fault with a register dump, etc.). -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: 8.1-RELEASE hangs on reboot
On Tuesday, November 30, 2010 8:23:19 pm Ondřej Majerech wrote: Hello, my 8.1-R system has just started hanging on reboot. Specifically after I svn up'd my source and updated from 8.1-R-p1 to -p2. Some kind of hang occurs on every reboot attempt. Usually it hangs at the Rebooting... message, but sometimes the thing just locks up before it even syncs disks. shutdown -p now seems to shutdown the system successfully each time. So I booted into single-user mode, executed reboot and during the Syncing disks I pressed Ctrl-Alt-Escape to break into the debugger. There I single-stepped with the s command until the thing simply stopped doing anything. (Even if I pressed NumLock, the LED on the keyboard wouldn't turn off.) The screen content at the moment of hang is (dutifully typed over as the thing is dead and I don't have a serial cable): [thread pid 12 tid 100017 ] Stopped at sckbdevent+0x5f: call _mtx_unlock_flags db [thread pid 12 tid 100017 ] Stopped at _mtx_unlock_flags: pushq %rbp db [thread pid 12 tid 100017 ] Stopped at _mtx_unlock_flags+0x1: movq %rsp,%rbp db [thread pid 12 tid 100017 ] Stopped at _mtx_unloock_flags+0x4: subq $0x20,%rsp db [thread pid 12 tid 100017 ] Stopped at _mtx_unlock_flags+0x8: movq %rbx,(%rsp) db [thread pid 12 tid 100017 ] Stopped at _mtx_unlock_flags+0xc: movq %r12,0x8(%rsp) db [thread pid 12 pid 100017 ] Stopped at _mtx_unlock_flags+0x11: movq %rdi,%rbx db [thread pid 12 pid 100017 ] Stopped at _mtx_unlock_flags+0x14: movq %r13,0x10(%rsp) db E Including that E at the end. No good ideas here, though I think we just turned off PSL_T by accident so it ran for a while before hanging after this. 'E' must be the start of a message on the console. As I said, it's 8.1-RELEASE-p2; it's on AMD64. I'm using custom kernel which only differs from GENERIC by addition of the debugging options: options INVARIANTS options INVARIANT_SUPPORT options WITNESS options DEBUG_LOCKS options DEBUG_VFS_LOCKS options DIAGNOSTIC I tried rebooting with ACPI disabled, but the thing paniced on boot with panic: Duplicate free of item 0xff00025e from zone 0xff00bfdcc2a0(1024) cpuid = 0 KDB: enter: panic [thread pid 0 tid 10 ] Stopped at kdb_enter+0x3d: movq $0, 0x6b2d20(%rip) db bt Tracing pid 0 tid 10 td 0x80c63fc0 kdb_enter() at kdb_enter+0x3d panic() at panic+0x17b uma_dbg_free() at uma_dbg_free+0x171 uma_zfree_arg() at uma_zfree_arg+0x68 free() at free+0xcd device_set_driver() at device_set_driver+0x7c device_attach() at device_attach+0x19b bus_generic_attach() at bus_generic_attach+0x1a pci_attach() at pci_attach+0xf1 The free() should be the free to free the softc but that implies it had a previous driver and softc. Maybe add some debug info to devclass_set_driver() to print out the previous driver's name (and maybe the value of the pointer) before free'ing the softc. You could use gdb on the kernel.debug and the pointer value to figure out exactly which driver was the previous one and look to see if it's probe routine does something funky with the softc pointer. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: How to debug BTX loader?
On Wednesday, December 01, 2010 4:09:42 pm Darmawan Salihun wrote: Hi John, --- On Tue, 11/30/10, John Baldwin j...@freebsd.org wrote: From: John Baldwin j...@freebsd.org Subject: Re: How to debug BTX loader? To: freebsd-hackers@freebsd.org Cc: Darmawan Salihun darmawan_sali...@yahoo.com Date: Tuesday, November 30, 2010, 9:38 AM On Monday, November 29, 2010 1:01:27 pm Darmawan Salihun wrote: Hi guys, I'm currently working on a BIOS for a custom Single Board Computer (SBC). I have the required BIOS source code and tools at hand. However, the boot process always stuck in the BTX loader (the infamous ACPI autoload failed) when I booted out of USB stick (with the FreeBSD 8.1 USB stick image). I could get the system to boot into FreeBSD 8.1 (by keeping the CDROM tray open and close it when the board looks for boot device, otherwise BTX will reboot instantly). Are you getting an actual BTX error message or a freeze? BTX is just a minikernel written all in assembly. It doesn't handle loading the kernel, etc. All that work is done by the /boot/loader program (which is written in C). You can find all the source to the boot code in src/sys/boot. The BTX kernel is in src/sys/boot/i386/btx/btx/. However, to debug this further we would need more info such as what exactly you are seeing (a hang, a BTX fault with a register dump, etc.). One of the BTX fault shows the register dump in the attachment. I hope this could help. Anyway, If I were to try to interpret such register dump, where should I start? I understand x86/x86_64 assembly pretty much, but I'm not quite well versed with the FreeBSD code using it. Looks like the mailing list stripped the attachment. Can you post the attachment at a URL? -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: How to debug BTX loader?
On Thursday, December 02, 2010 2:12:04 pm Darmawan Salihun wrote: Hi John, --- On Thu, 12/2/10, John Baldwin j...@freebsd.org wrote: From: John Baldwin j...@freebsd.org Subject: Re: How to debug BTX loader? To: freebsd-hackers@freebsd.org Cc: Darmawan Salihun darmawan_sali...@yahoo.com Date: Thursday, December 2, 2010, 8:58 AM On Wednesday, December 01, 2010 4:09:42 pm Darmawan Salihun wrote: Hi John, --- On Tue, 11/30/10, John Baldwin j...@freebsd.org wrote: From: John Baldwin j...@freebsd.org Subject: Re: How to debug BTX loader? To: freebsd-hackers@freebsd.org Cc: Darmawan Salihun darmawan_sali...@yahoo.com Date: Tuesday, November 30, 2010, 9:38 AM On Monday, November 29, 2010 1:01:27 pm Darmawan Salihun wrote: Hi guys, I'm currently working on a BIOS for a custom Single Board Computer (SBC). I have the required BIOS source code and tools at hand. However, the boot process always stuck in the BTX loader (the infamous ACPI autoload failed) when I booted out of USB stick (with the FreeBSD 8.1 USB stick image). I could get the system to boot into FreeBSD 8.1 (by keeping the CDROM tray open and close it when the board looks for boot device, otherwise BTX will reboot instantly). Are you getting an actual BTX error message or a freeze? BTX is just a minikernel written all in assembly. It doesn't handle loading the kernel, etc. All that work is done by the /boot/loader program (which is written in C). You can find all the source to the boot code in src/sys/boot. The BTX kernel is in src/sys/boot/i386/btx/btx/. However, to debug this further we would need more info such as what exactly you are seeing (a hang, a BTX fault with a register dump, etc.). One of the BTX fault shows the register dump in the attachment. I hope this could help. Anyway, If I were to try to interpret such register dump, where should I start? I understand x86/x86_64 assembly pretty much, but I'm not quite well versed with the FreeBSD code using it. Looks like the mailing list stripped the attachment. Can you post the attachment at a URL? The BTX crash message is in the attachment. Ok, so clearly the instruction pointer has jumped off into the weeds given that the instruction stream is all 0xff. The instruction pointer value (0xc09d3600) implies that this is in the kernel already during early kernel startup (before the kernel installs its own IDT with its own fault and exception handlers). It might be helpful to pull up gdb on your kernel.debug file and do 'l *0xc09d3600' to see what you get. Looking at the stack '0xc1830188' might be another address in the kernel. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: coretemp(4)/amdtemp(4) and sysctl nodes
On Friday, December 03, 2010 1:05:02 pm m...@freebsd.org wrote: There are very few uses in FreeBSD mainline code of sysctl_remove_oid(), and I was looking at potentially removing them. However, the use in coretemp/amdtemp has me slightly stumped. Each device provides a device_get_sysctl_ctx sysctl_ctx that is automatically cleaned up when the device goes away. Yet the sysctl nodes for both amdtemp and coretemp use the context of other devices, rather than their own. I can't quite figure out why, though the two are slightly different enough that they may have different reasons. For coretmp(4) I don't see how the parent device can be removed first, since we are a child device. So from my understanding it makes no sense to have an explicit sysctl_remove_oid() and attach in the parent's sysctl_ctx. Well, you would want 'kldunload coretemp.ko' to remove the sysctl node even though the parent device is still around. I suspect the same case is true for amdtemp. Probably these drivers should use a separate sysctl context. I'm not sure how the sysctl code handles removing a node that has an active context though. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: small dtrace patch for review
On Friday, December 03, 2010 11:57:42 am Andriy Gapon wrote: The patch is not about DTrace functionality, but about infrastructure use in one particular place. http://people.freebsd.org/~avg/dtrace_gethrtime_init.diff I believe that sched_pin() is need there to make sure that host/base CPU stays the same for all calls to smp_rendezvous_cpus(). The pc_cpumask should just be a cosmetic change. Looks good to me. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: atomic_set_xxx(x, 0)
On Tuesday, December 07, 2010 12:58:43 pm Andriy Gapon wrote: $ glimpse atomic_set_ | fgrep -w 0 /usr/src/sys/dev/arcmsr/arcmsr.c: atomic_set_int(acb-srboutstandingcount, 0); /usr/src/sys/dev/arcmsr/arcmsr.c: atomic_set_int(acb-srboutstandingcount, 0); /usr/src/sys/dev/jme/if_jme.c: atomic_set_int(sc-jme_morework, 0); /usr/src/sys/dev/jme/if_jme.c: atomic_set_int(sc-jme_morework, 0); /usr/src/sys/dev/ale/if_ale.c: atomic_set_int(sc-ale_morework, 0); /usr/src/sys/mips/rmi/dev/xlr/rge.c: atomic_set_int((priv-frin_to_be_sent[i]), 0); /usr/src/sys/dev/drm/drm_irq.c: atomic_set_rel_32(dev-vblank[i].count, 0); /usr/src/sys/dev/cxgb/ulp/tom/cxgb_tom.c: atomic_set_int(t-tids_in_use, 0); I wonder if these are all bugs and atomic_store_xxx() was actually intended? They are most likely bugs. You can probably ask yongari@ about jme(4) and ale(4) and np@ about cxgb(4). drm_irq looks to want to be an atomic_store_rel(). Not sure who to ask about arcmsr(4). I'm not sure arcmsr(4) really needs the atomic ops at all, but it should be using atomic_fetchadd() and atomic_readandclear() instead of some of the current atomic ops. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: getting rid of some -mno-* flags under sys/boot
On Sunday, December 19, 2010 12:42:01 pm Garrett Cooper wrote: On Sun, Dec 19, 2010 at 3:23 AM, Alexander Best arun...@freebsd.org wrote: hi there, i think some of the -mno-* flags in sys/boot/* can be scrubbed, since they're already being included from ../Makefile.inc. Looks good. also TARGET cleandir leaves some files behind in i386/gptboot which should be fixed by this patch. AHA. This might fix the issue I've seen rebuilding stuff with gptzfsboot for a good while now where I have to (on mostly rare occasions with -j24, etc typically after updating my source tree) rebuild it manually. gptzfsboot and zfsboot also need the fix, BTW. The only thing is that these files live under the common directory, so shouldn't common clean them up (I see that common doesn't have a Makefile though, only a Makefile.inc -- ouch)? FWIW though, wouldn't it be better to avoid this accidental bug and unnecessary duplication by doing something like the following? # ... OBJS=zfsboot.o sio.o gpt.o drv.o cons.o util.o CLEANFILES+= gptzfsboot.out ${OBJS} gptzfsboot.out: ${BTXCRT} ${OBJS} # ... Yes, an OBJS would be good. Also, gptboot.c was recently changed to not #include ufsread.c, so that explicit dependency can be removed, as can the GPTBOOT_UFS variable. Similar fixes probably apply to gptzfsboot. BTW, the code in common/ is not built into a library, but specific boot programs (typically /boot/loader on different platforms) include specific objects. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: PCI IDE Controller Base Address Register setting
On Monday, December 27, 2010 6:07:35 am Darmawan Salihun wrote: Hi, I'm trying to install FreeBSD 8.0 on AMD Geode LX800 (CS5536 southbridge). However, it cannot detect the IDE controller (in the CS5536) correctly. It says something similar to this: IDE controller not present Hmm, I can't find a message like that anywhere. Can you get the exact message you are seeing? I did lspci in Linux (BackTrack 3) and I saw that the IDE controller Base Address Registers (BARs) are all disabled (only contains zeros), except for one of them (BAR4). BAR4 decodes 16-bytes I/O ports (FFF0h-h). The decoded ports seems to conform to the PCI IDE specification for native-PCI IDE controller (relocatable within the 16-bit I/O address space). I did cat /proc/ioports and I found that the following I/O port address ranges decoded correctly to the IDE controller in the CS5536 southbridge: 1F0h-1F7h 3F6h 170h-177h FFF0h-h My question: Does FreeBSD require the IDE controller BARs to be programmed to also decode legacy I/O ports ranges (1F0h-1F7h,3F6h and 170h-177h)? No. We hardcode the ISA ranges for BARs 0 through 3 if a PCI IDE controller has the Primary or Secondary bits set in its programming interface register and don't even look at the BARs. We do always examines BARs 4 and 5 using the normal probing scheme of writing all 1's, etc. The code in question looks like this: /* * For ATA devices we need to decide early what addressing mode to use. * Legacy demands that the primary and secondary ATA ports sits on the * same addresses that old ISA hardware did. This dictates that we use * those addresses and ignore the BAR's if we cannot set PCI native * addressing mode. */ static void pci_ata_maps(device_t bus, device_t dev, struct resource_list *rl, int force, uint32_t prefetchmask) { struct resource *r; int rid, type, progif; #if 0 /* if this device supports PCI native addressing use it */ progif = pci_read_config(dev, PCIR_PROGIF, 1); if ((progif 0x8a) == 0x8a) { if (pci_mapbase(pci_read_config(dev, PCIR_BAR(0), 4)) pci_mapbase(pci_read_config(dev, PCIR_BAR(2), 4))) { printf(Trying ATA native PCI addressing mode\n); pci_write_config(dev, PCIR_PROGIF, progif | 0x05, 1); } } #endif progif = pci_read_config(dev, PCIR_PROGIF, 1); type = SYS_RES_IOPORT; if (progif PCIP_STORAGE_IDE_MODEPRIM) { pci_add_map(bus, dev, PCIR_BAR(0), rl, force, prefetchmask (1 0)); pci_add_map(bus, dev, PCIR_BAR(1), rl, force, prefetchmask (1 1)); } else { rid = PCIR_BAR(0); resource_list_add(rl, type, rid, 0x1f0, 0x1f7, 8); r = resource_list_reserve(rl, bus, dev, type, rid, 0x1f0, 0x1f7, 8, 0); rid = PCIR_BAR(1); resource_list_add(rl, type, rid, 0x3f6, 0x3f6, 1); r = resource_list_reserve(rl, bus, dev, type, rid, 0x3f6, 0x3f6, 1, 0); } if (progif PCIP_STORAGE_IDE_MODESEC) { pci_add_map(bus, dev, PCIR_BAR(2), rl, force, prefetchmask (1 2)); pci_add_map(bus, dev, PCIR_BAR(3), rl, force, prefetchmask (1 3)); } else { rid = PCIR_BAR(2); resource_list_add(rl, type, rid, 0x170, 0x177, 8); r = resource_list_reserve(rl, bus, dev, type, rid, 0x170, 0x177, 8, 0); rid = PCIR_BAR(3); resource_list_add(rl, type, rid, 0x376, 0x376, 1); r = resource_list_reserve(rl, bus, dev, type, rid, 0x376, 0x376, 1, 0); } pci_add_map(bus, dev, PCIR_BAR(4), rl, force, prefetchmask (1 4)); pci_add_map(bus, dev, PCIR_BAR(5), rl, force, prefetchmask (1 5)); } -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: PCI IDE Controller Base Address Register setting
On Tuesday, December 28, 2010 1:38:05 pm Darmawan Salihun wrote: Hi, --- On Tue, 12/28/10, John Baldwin j...@freebsd.org wrote: From: John Baldwin j...@freebsd.org Subject: Re: PCI IDE Controller Base Address Register setting To: freebsd-hackers@freebsd.org Cc: Darmawan Salihun darmawan_sali...@yahoo.com Date: Tuesday, December 28, 2010, 10:20 AM On Monday, December 27, 2010 6:07:35 am Darmawan Salihun wrote: Hi, I'm trying to install FreeBSD 8.0 on AMD Geode LX800 (CS5536 southbridge). However, it cannot detect the IDE controller (in the CS5536) correctly. It says something similar to this: IDE controller not present Hmm, I can't find a message like that anywhere. Can you get the exact message you are seeing? It says: No disks found! Please verify that your disk controller is being properly probed at boot time. Oh, so this is a message from the installer. Can you capture a verbose dmesg via a serial console perhaps? Or at least the kernel probe messages for your ATA controller? -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: PCI IDE Controller Base Address Register setting
On Tuesday, December 28, 2010 2:10:59 pm Darmawan Salihun wrote: Hi, --- On Tue, 12/28/10, John Baldwin j...@freebsd.org wrote: From: John Baldwin j...@freebsd.org Subject: Re: PCI IDE Controller Base Address Register setting To: Darmawan Salihun darmawan_sali...@yahoo.com Cc: freebsd-hackers@freebsd.org Date: Tuesday, December 28, 2010, 1:52 PM On Tuesday, December 28, 2010 1:38:05 pm Darmawan Salihun wrote: Hi, --- On Tue, 12/28/10, John Baldwin j...@freebsd.org wrote: From: John Baldwin j...@freebsd.org Subject: Re: PCI IDE Controller Base Address Register setting To: freebsd-hackers@freebsd.org Cc: Darmawan Salihun darmawan_sali...@yahoo.com Date: Tuesday, December 28, 2010, 10:20 AM On Monday, December 27, 2010 6:07:35 am Darmawan Salihun wrote: Hi, I'm trying to install FreeBSD 8.0 on AMD Geode LX800 (CS5536 southbridge). However, it cannot detect the IDE controller (in the CS5536) correctly. It says something similar to this: IDE controller not present Hmm, I can't find a message like that anywhere. Can you get the exact message you are seeing? It says: No disks found! Please verify that your disk controller is being properly probed at boot time. Oh, so this is a message from the installer. Can you capture a verbose dmesg via a serial console perhaps? I'm not sure if I can do this because I've tried a couple of times but nothing comes out of the serial console. Perhaps a wrong baud rate setting? I set it to 96bps and 8-N-1 back then. Is that correct? Yes, that should be correct. You have to turn the console on however (it is not enabled by default). The simplest way to do this is probably to hit the key option to break into the loader prompt when you see the boot menu (I think it is option '6'). Then enter 'boot -D' at the 'OK' prompt. This should boot with both the video and serial consoles enabled with the video console as the primary console. For a verbose boot, use 'boot -Dv' If you want to test out the serial console before you boot, you can instead enter 'set console=vidconsole,comconsole' at the prompt. You should then see an OK prompt on both the screen and the serial port. Note that the serial console is hardcoded to use the default I/O ports for COM1. Or at least the kernel probe messages for your ATA controller? I recall that pressing Alt+F2 during the installation would open-up another console, full with log messages. Would that be enough? Actually, the kernel probe messages are on the main console, but you can hit scroll lock to freeze the console and then use page up to go back in history and find the messages. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: PCI IDE Controller Base Address Register setting
On Saturday, January 01, 2011 2:58:12 pm Darmawan Salihun wrote: --- On Thu, 12/30/10, Darmawan Salihun darmawan_sali...@yahoo.com wrote: From: Darmawan Salihun darmawan_sali...@yahoo.com Subject: Re: PCI IDE Controller Base Address Register setting To: John Baldwin j...@freebsd.org Cc: freebsd-hackers@freebsd.org Date: Thursday, December 30, 2010, 3:28 PM --- On Tue, 12/28/10, John Baldwin j...@freebsd.org wrote: From: John Baldwin j...@freebsd.org Subject: Re: PCI IDE Controller Base Address Register setting To: Darmawan Salihun darmawan_sali...@yahoo.com Cc: freebsd-hackers@freebsd.org Date: Tuesday, December 28, 2010, 2:22 PM On Tuesday, December 28, 2010 2:10:59 pm Darmawan Salihun wrote: Hi, --- On Tue, 12/28/10, John Baldwin j...@freebsd.org wrote: From: John Baldwin j...@freebsd.org Subject: Re: PCI IDE Controller Base Address Register setting To: Darmawan Salihun darmawan_sali...@yahoo.com Cc: freebsd-hackers@freebsd.org Date: Tuesday, December 28, 2010, 1:52 PM On Tuesday, December 28, 2010 1:38:05 pm Darmawan Salihun wrote: Hi, --- On Tue, 12/28/10, John Baldwin j...@freebsd.org wrote: From: John Baldwin j...@freebsd.org Subject: Re: PCI IDE Controller Base Address Register setting To: freebsd-hackers@freebsd.org Cc: Darmawan Salihun darmawan_sali...@yahoo.com Date: Tuesday, December 28, 2010, 10:20 AM On Monday, December 27, 2010 6:07:35 am Darmawan Salihun wrote: Hi, I'm trying to install FreeBSD 8.0 on AMD Geode LX800 (CS5536 southbridge). However, it cannot detect the IDE controller (in the CS5536) correctly. It says something similar to this: IDE controller not present Hmm, I can't find a message like that anywhere. Can you get the exact message you are seeing? It says: No disks found! Please verify that your disk controller is being properly probed at boot time. Oh, so this is a message from the installer. Can you capture a verbose dmesg via a serial console perhaps? I'm not sure if I can do this because I've tried a couple of times but nothing comes out of the serial console. Perhaps a wrong baud rate setting? I set it to 96bps and 8-N-1 back then. Is that correct? Yes, that should be correct. You have to turn the console on however (it is not enabled by default). The simplest way to do this is probably to hit the key option to break into the loader prompt when you see the boot menu (I think it is option '6'). Then enter 'boot -D' at the 'OK' prompt. This should boot with both the video and serial consoles enabled with the video console as the primary console. For a verbose boot, use 'boot -Dv' Thanks, I tested this option and it worked. I could see the debugging messages. FreeBSD cannot detect the disk in all of the IDE interfaces. (The AMDCS5536 only implemented the primary channel) Anyway, I manage to change the mapping in BAR4 of the IDE controller. However, I'm confused as to how to force FreeBSD to recognize the IDE controller to work only in compatibility mode. Because, I'm not sure if the physical IDE controller chip supports Native-PCI mode correctly at all. If I set BAR4 to disabled(i.e. not decoding any I/O addresses at all), would FreeBSD use compatibility mode? or would it consider the IDE controller not present? Here's some notes about the IDE controller PCI configuration registers: 1. The Programming Interface register contains 80h (which means _only_ compatibility mode supported). I have yet to be able to write new values into this register. That's the state of the register right now. I noticed in your previous reply that for FreeBSD to be forced to use compatibility mode, the programming interface register bits in the PCI configuration register must be set accordingly (I suppose the bits in the lower nibble). 2. BAR0-BAR3 cannot be changed and contains 00h. I have yet to experiment with BAR5.The default value is 00h Silly me that I didn't know about the SFF-8038i standard (PCI IDE Bus mastering). So, I found out that it seems the allocation of I/O ports for the IDE controller is just fine. However, the primary IDE channel is shared between an IDE interface and a CF card. Moreover, Linux detects DMA bug, because all drives connected to the interface would be in PIO mode :-/ If all drives on the primary channel are forced to PIO mode, then shouldn't the IDE PCI bus master register (offset 20h per SFF-8038i) along with the command register (offset 4h), are set
Re: PANIC: thread_exit: Last thread exiting on its own.
On Friday, December 31, 2010 4:22:36 am Lev Serebryakov wrote: Hello, Giovanni. You wrote 31 декабря 2010 г., 1:56:20: I've got this panic on reboot from geom_raid5. Could you please provide some backtrace? Have you got a core? Backtrace was were simple (I've reproduce it from my memory, but it really was that simple): all debugger-related stuff panic() thread_exit() kthread_exit() g_raid5_worker() fork_trampoline() ... No core, because I didn't have dumpdev configured :( Which revision of -STABLE are you running(or when last src update)? uname shows: FreeBSD 8.2-PRERELEASE #2: Tue Dec 21 01:17:16 MSK 2010 I've rebuilt kernel RIGHT after `csup', so difference is no more than several hours. Looks like 204087 needs to be MFC'd. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: [patch] have rtprio check that arguments are numeric; change atoi to strtol
On Tuesday, January 04, 2011 6:25:02 am Kostik Belousov wrote: On Tue, Jan 04, 2011 at 11:40:45AM +0100, Giorgos Keramidas wrote: @@ -123,12 +121,28 @@ main(argc, argv) } exit(0); } - exit (1); + exit(1); +} + +static int +parseint(const char *str, const char *errname) +{ + char *endp; + long res; + + errno = 0; + res = strtol(str, endp, 10); + if (errno != 0 || endp == str || *endp != '\0') + err(1, %s shall be a number, errname); Small nit, maybe use 'must' instead of 'shall'. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Building third-party modules for kernel with debug options?
On Friday, January 07, 2011 7:15:59 am Lev Serebryakov wrote: Hello, Freebsd-hackers. I've found, that struct bio is depend on state of DIAGNOSTIC flag (options DIAGNOSTIC in kernel config). But when I build third-party GEOM (or any other) module with using of bsd.kmod.mk, there is no access to these options. So, module, built from ports, can fail on user's kernel, even if it built with proper kernel sources in /usr/src/sys. Is here any solution for this problem? P.S. NB: GEOM module is only example, question is about modules kernel options in general, so I put this message on Hackers list. In general we try to avoid having public kernel data structures change size when various kernel options are in use. Some noticeable exceptions to this rule are PAE (i386-only) and LOCK_PROFILING (considered to be something users would not typically use). DIAGNOSTIC might arguably be considered the same as LOCK_PROFILING, but I am surprised it affects bio. It should only affect a GEOM module that uses bio_pblockno however in this case since you should be using kernel routines to allocate bio structures rather than malloc'ing one directly. Perhaps phk@ would ok moving bio_pblockno up above the optional diagnostic fields. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: [rfc] allow to boot with = 256GB physmem
On Friday, January 21, 2011 11:09:10 am Sergey Kandaurov wrote: Hello. Some time ago I faced with a problem booting with 400GB physmem. The problem is that vm.max_proc_mmap type overflows with such high value, and that results in a broken mmap() syscall. The max_proc_mmap value is a signed int and roughly calculated at vmmapentry_rsrc_init() as u_long vm_kmem_size quotient: vm_kmem_size / sizeof(struct vm_map_entry) / 100. Although at the time it was introduced at svn r57263 the value was quite low (f.e. the related commit log stands: The value defaults to around 9000 for a 128MB machine.), the problem is observed on amd64 where KVA space after r212784 is factually bound to the only physical memory size. With INT_MAX here is 0x7fff, and sizeof(struct vm_map_entry) is 120, it's enough to have sligthly less than 256GB to be able to reproduce the problem. I rewrote vmmapentry_rsrc_init() to set large enough limit for max_proc_mmap just to protect from integer type overflow. As it's also possible to live tune this value, I also added a simple anti-shoot constraint to its sysctl handler. I'm not sure though if it's worth to commit the second part. As this patch may cause some bikeshedding, I'd like to hear your comments before I will commit it. http://plukky.net/~pluknet/patches/max_proc_mmap.diff Is there any reason we can't just make this variable and sysctl a long? -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: pci_suspend/pci_resume of custom pcie board
On Tuesday, January 25, 2011 9:47:35 am Philip Soeberg wrote: Hi, I'm in a particular problem where I need to set my custom pcie adapter into d3hot power-mode and a couple of seconds later reset it back to d0. The board has an FPGA directly attached to the pcie interface, and as I need to re-configure the FPGA on the fly, I have to ensure the datalink layer between the upstream bridge and my device is idle to prevent any hickups. On linux I simply do a pci_save_state(device) followed by pci_set_power_state(device, d3hot), then after my magic on my board, I do the reverse: pci_set_power_state(device, d0) followed by pci_restore_state(device). On FreeBSD, say 8, I've found the pci_set_powerstate function, which is documented in PCI(9), but that function does not save nor restore the config space. I've tried, just for the fun of it, to go via pci_cfg_save(device, dinfo, 0) with dinfo being device_get_ivars(device) and then subsequently restoring the config space back via pci_cfg_restore(), but since both those functions are declared in dev/pci/pci_private.h I'm not sure if I'm supposed to use those directly or not.. Besides, I'm not really having any luck with that approach. Reading high and low on the net suggest that not all too many driver devs are concerned with suspend/resume operation of their device, and if they are, leave it to user-space to decide when to suspend/resume a device.. I would like to be able to save off my device' config space, put it to sleep (d3hot), wake it back up (d0) and restore the device' config space directly from the device' own driver.. Anyone who can help me with this? Use this: pci_cfg_save(dev, dinfo, 0); pci_set_powerstate(dev, PCI_POWERSTATE_D3); /* do stuff */ /* Will set state to D0. */ pci_cfg_restore(dev, dinfo); We probably should create some wrapper routines (pci_save_state() and pci_restore_state() would be fine) that hide the 'dinfo' detail as that isn't something device drivers should have to know. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: rtld optimizations
On Wednesday, January 26, 2011 10:25:27 am Mark Felder wrote: On Tue, 25 Jan 2011 22:49:11 -0600, Alexander Kabaev kab...@gmail.com wrote: The only extra quirk that said commit does is an optimization of a dlsym() call, which is hardly ever in critical performance path. It's really not my place to say, but it seems strange that if an optimization is available people would ignore it because they don't think it's important enough. I don't understand this mentality; if it's not going to break anything and it obviously can improve performance in certain use cases, why not merge it and make FreeBSD even better? Many things that seem obvious aren't actually true, hence the need for actual testing and benchmarks. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Namecache lock contention?
On Friday, January 28, 2011 8:46:07 am Ivan Voras wrote: I have this situation on a PHP server: 36623 www 1 760 237M 30600K *Name 6 0:14 47.27% php-cgi 36638 www 1 760 237M 30600K *Name 3 0:14 46.97% php-cgi 36628 www 1 1050 237M 30600K *Name 2 0:14 46.88% php-cgi 36627 www 1 1050 237M 30600K *Name 0 0:14 46.78% php-cgi 36639 www 1 1050 237M 30600K *Name 5 0:14 46.58% php-cgi 36643 www 1 1050 237M 30600K *Name 7 0:14 46.39% php-cgi 36629 www 1 760 237M 30600K *Name 1 0:14 46.39% php-cgi 36642 www 1 1050 237M 30600K *Name 2 0:14 46.39% php-cgi 36626 www 1 1050 237M 30600K *Name 5 0:14 46.19% php-cgi 36654 www 1 1050 237M 30600K *Name 7 0:13 46.19% php-cgi 36645 www 1 1050 237M 30600K *Name 1 0:14 45.75% php-cgi 36625 www 1 1050 237M 30600K *Name 0 0:14 45.56% php-cgi 36624 www 1 1050 237M 30600K *Name 6 0:14 45.56% php-cgi 36630 www 1 760 237M 30600K *Name 7 0:14 45.17% php-cgi 36631 www 1 1050 237M 30600K RUN 4 0:14 45.17% php-cgi 36636 www 1 1050 237M 30600K *Name 3 0:14 44.87% php-cgi It looks like periodically most or all of the php-cgi processes are blocked in *Name for long enough that top notices, then continue, probably in a thundering herd way. From grepping inside /sys the most likely suspect seems to be something in the namecache, but I can't find exactly a symbol named Name or string beginning with Name that would be connected to a lock. In vfs_cache.c: static struct rwlock cache_lock; RW_SYSINIT(vfscache, cache_lock, Name Cache); What are the php scripts doing? Do they all try to create and delete files at the same time (or do renames)? -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Divide-by-zero in loader
On Friday, January 28, 2011 12:41:08 pm Matthew Fleming wrote: I spent a few days chasing down a bug and I'm wondering if a loader change would be appropriate. So we have these new front-panel LCDs, and like everything these days it's a SoC. Normally it presents to FreeBSD as a USB communications device (ucom), but when the SoC is sitting in its own boot loader, it presents as storage (umass). If the box is rebooted in this state, the reboot gets into /boot/loader and then reboots itself. (It took a few days just to figure out I was getting into /boot/loader, since the only prompt I could definitively stop at was boot2). Anyways, I eventually debugged it to the device somehow presenting itself to /boot/loader with a geometry of 1024/256/0, and since od_sec is 0 that causes a divide-by-zero error in bd_io() while the loader is trying to figure out if this is GPT or MBR formatted. We're still trying to figure out why the loader sees this incorrect geometry. But meanwhile, this patch fixes the issue, and I wonder if it would be a useful safety-belt for other devices where an incorrect geometry can be seen? That's probably fine. A sector count of zero is invalid for CHS. However, probably we should not even be using C/H/S at all if the device claims to support EDD. We already use raw LBAs if it supports EDD, and we should probably just ignore C/H/S altogether if it supports EDD. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Divide-by-zero in loader
On Friday, January 28, 2011 2:14:45 pm Matthew Fleming wrote: On Fri, Jan 28, 2011 at 11:00 AM, John Baldwin j...@freebsd.org wrote: On Friday, January 28, 2011 12:41:08 pm Matthew Fleming wrote: I spent a few days chasing down a bug and I'm wondering if a loader change would be appropriate. So we have these new front-panel LCDs, and like everything these days it's a SoC. Normally it presents to FreeBSD as a USB communications device (ucom), but when the SoC is sitting in its own boot loader, it presents as storage (umass). If the box is rebooted in this state, the reboot gets into /boot/loader and then reboots itself. (It took a few days just to figure out I was getting into /boot/loader, since the only prompt I could definitively stop at was boot2). Anyways, I eventually debugged it to the device somehow presenting itself to /boot/loader with a geometry of 1024/256/0, and since od_sec is 0 that causes a divide-by-zero error in bd_io() while the loader is trying to figure out if this is GPT or MBR formatted. We're still trying to figure out why the loader sees this incorrect geometry. But meanwhile, this patch fixes the issue, and I wonder if it would be a useful safety-belt for other devices where an incorrect geometry can be seen? That's probably fine. A sector count of zero is invalid for CHS. However, probably we should not even be using C/H/S at all if the device claims to support EDD. We already use raw LBAs if it supports EDD, and we should probably just ignore C/H/S altogether if it supports EDD. This is all almost entirely outside my knowledge, but at the moment bd_eddprobe() requres a geometry of 1023/255/63 before it attempts to check if EDD can be used. Is that check incorrect? Well, it is very conservative in that it only uses EDD if it thinks it can't use C/H/S. It would be interesting to see if simply checking for a sector count of 0 there would avoid the divide-by-zero and let your device work. However, it might actually be useful to always use EDD if possible, esp. EDD3 since that lets you not use bounce buffers down in 1MB. In my specific case I know there's no bootable stuff on this disk; the earlier layers bypassed it correctly without a problem. Thanks, matthew -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: NVIDIA (port) driver fails to create /dev/nvidactl; 8.2Prerelease
On Friday, January 28, 2011 3:43:12 pm Duane H. Hesser wrote: I am attempting to replace the 'nv' X11 driver with the official nvidia driver from ithe x11/nvidia-driver port, in order to handle the AVCHD video files from my Canon HF S20. I have been trying for several days now, having read the nvidia README file in /usr/local/share and everything Google has to offer. Unfortunately devilfs is smarter and meaner than I. The 'xorg.conf' file is created by nividia-xconfig. The console output when calling 'startx' to begin the frustration is =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= X.Org X Server 1.7.5 Release Date: 2010-02-16 X Protocol Version 11, Revision 0 Build Operating System: FreeBSD 8.1-RELEASE i386 Current Operating System: FreeBSD belinda.androcles.org 8.2-PRERELEASE FreeBSD 8.2-PRERELEASE #3: Thu Jan 27 13:45:06 PST 2011 r...@belinda.androcles.org:/usr/obj/usr/src/sys/BELINDA i386 Build Date: 08 January 2011 05:52:50PM Current version of pixman: 0.18.4 Before reporting problems, check http://wiki.x.org to make sure that you have the latest version. Markers: (--) probed, (**) from config file, (==) default setting, (++) from command line, (!!) notice, (II) informational, (WW) warning, (EE) error, (NI) not implemented, (??) unknown. (==) Log file: /var/log/Xorg.0.log, Time: Fri Jan 28 11:32:46 2011 (==) Using config file: /etc/X11/xorg.conf NVIDIA: could not open the device file /dev/nvidiactl (No such file or directory). (EE) Jan 28 11:32:46 NVIDIA(0): Failed to initialize the NVIDIA kernel module. Please see the (EE) Jan 28 11:32:46 NVIDIA(0): system's kernel log for additional error messages and (EE) Jan 28 11:32:46 NVIDIA(0): consult the NVIDIA README for details. (EE) NVIDIA(0): *** Aborting *** (EE) Screen(s) found, but none have a usable configuration. Fatal server error: no screens found You don't have an nvidia0 device attached to vgapci0. I would suggest adding printfs to the nvidia driver's probe routine to find out why it failed to probe. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: weird characters in top(1) output
On Tuesday, February 01, 2011 8:11:54 am Alexander Best wrote: On Tue Feb 1 11, Sergey Kandaurov wrote: On 1 February 2011 15:24, Alexander Best arun...@freebsd.org wrote: hi there, i was doing the following: top inf ~/output when i noticed that this was missing the overall statistics line. so i went ahead and did: top -d2 inf ~/output funny thing is that for the second output some weird characters seem to get spammed into the overall statistics line: last pid: 14320; load averages: 0.42, 0.44, 0.37 up 1+14:02:02 13:21:05 249 processes: 1 running, 248 sleeping CPU: ^[[3;6H 7.8% user, 0.0% nice, 10.6% system, 0.6% interrupt, 81.0% idle Mem: 1271M Active, 205M Inact, 402M Wired, 67M Cache, 212M Buf, 18M Free Swap: 18G Total, 782M Used, 17G Free, 4% Inuse this only seems to happen when i redirect the top(1) output to a file. if i do: top -d2 inf ...everything works fine. i verified the issue under zsh(1) and sh(1). My quick check shows that this is a regression between 7.2 and 7.3. Reverting r196382 fixes this bug for me. thanks for the help. indeed reverting r196382 fixes the issue. Hmm, you need more than 10 CPUs to understand the reason for that fix. Without it all of the updated per-CPU states are off by one column so you get weird screen effects. The garbage characters are actually just a terminal sequence to move the cursor. top uses these things a _lot_ to move the cursor around. You can try this instead though, it figures out the appropriate number of spaces rather than using Move_to() for these two routines: Index: display.c === --- display.c (revision 218032) +++ display.c (working copy) @@ -447,12 +447,14 @@ /* print tag and bump lastline */ if (num_cpus == 1) printf(\nCPU: ); -else - printf(\nCPU %d: , cpu); +else { + value = printf(\nCPU %d: , cpu); + while (value++ = cpustates_column) + printf( ); +} lastline++; /* now walk thru the names and print the line */ -Move_to(cpustates_column, y_cpustates + cpu); while ((thisname = *names++) != NULL) { if (*thisname != '\0') @@ -532,7 +534,7 @@ register char **names; register char *thisname; register int *lp; -int cpu; +int cpu, value; for (cpu = 0; cpu num_cpus; cpu++) { names = cpustate_names; @@ -540,11 +542,13 @@ /* show tag and bump lastline */ if (num_cpus == 1) printf(\nCPU: ); -else - printf(\nCPU %d: , cpu); +else { + value = printf(\nCPU %d: , cpu); + while (value++ = cpustates_column) + printf( ); +} lastline++; -Move_to(cpustates_column, y_cpustates + cpu); while ((thisname = *names++) != NULL) { if (*thisname != '\0') -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Strange problems in the old libc malloc routines
On Wednesday, February 02, 2011 01:04:15 pm Andrew Duane wrote: We are still using the FreeBSD 6 malloc routines, and are rather suddenly having a large number of problems with one or two of our programs. Before I dig into the 100+ crash dumps I have, I thought I'd see if anyone else has ever encountered this. The problems all seem to stem from some case of malloc returning the pointer 1 instead of either NULL or a valid pointer. Always exactly 1. Where this goes bad depends on where it happens (in the program or inside malloc itself), but that pointer value of 1 is always involved. Some of the structures like page_dir look corrupted too. It seems as if maybe the 1 is coming from sbrk(0) which is just returning the value of curbrk (which is correct, and not even close to 1). Could it be related to calls to malloc(0) perhaps? phkmalloc uses a constant for those that defaults to the last byte in a page (e.g. 4095 on x86). I'm not sure what platform you are using malloc on, but is it possible that you have ZEROSIZEPTR set to 1 somehow? Even so, if that is true free() should just ignore that pointer and not corrupt its internal state. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Analyzing wired memory?
() ff80798ea000 - ff8079949000 kmem_alloc_nofault() (kstack/mapdev) ff8079949000 - ff807994a000 kmem_alloc() / contigmalloc() ff807994a000 - ff807994b000 object 0xff0060568af8 ff807994b000 - ff8079969000 kmem_alloc_nofault() (kstack/mapdev) ff8079969000 - ff807996b000 ff807996b000 - ff80799b kmem_alloc() / contigmalloc() ff80799b - ff80799b1000 object 0xff00606caca8 ff80799b1000 - ff80799b2000 object 0xff00606caca8 ff80799b2000 - ff80799b6000 kmem_alloc() / contigmalloc() ff80799b6000 - ff80799b7000 object 0xff0060568af8 ff80799b7000 - ff80799b8000 object 0xff0060568af8 ff80799b8000 - ff8079cbc000 kmem_alloc() / contigmalloc() ff8079cbc000 - ff807aa0e000 kmem_alloc_nofault() (kstack/mapdev) ff807aa0e000 - 8000 8000 - 808164e8 text/data/bss 808164e8 - 81822000 bootstrap data (The various objects inserted directly into the kernel_map are likely from the nvidia driver.) The 'kvm' command in my gdb script is mostly MI, but some bits are MD such as the code to handle the 'AP stacks' region. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: CFR: FEATURE macros for AUDIT/CAM/IPC/KTR/MAC/NFS/NTP/PMC/SYSV/...
On Friday, February 11, 2011 4:30:28 am Alexander Leidinger wrote: Hi, during the last GSoC various FEATURE macros where added to the system. Before committing them, I would like to get some review (like if macro is in the correct file, and for those FEATURES where the description was not taken from NOTES if the description is OK). If nobody complains, I would like to commit this in 1-2 weeks. If you need more time to review, just tell me. Here is the list of affected files (for those impatient ones which do not want to look at the attached patch before noticing that they are not interested to look at it): Hmm, so what is the rationale for adding FEATURE() macros? Do we just want to add them for everything or do we want to add them on-demand as use cases for each knob arrive? Some features can already be inferred (e.g. if KTR is compiled in, then the debug.ktr.mask sysctl will exist). Also, in the case of KTR, I'm not sure that any userland programs need to alter their behavior based on whether or not that feature was present. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org