Re: a strange and terrible saga of the cursed iSCSI ZFS SAN
"Eugene M. Zheganin"wrote: > On 05.08.2017 22:08, Eugene M. Zheganin wrote: > > > > pool: userdata > > state: ONLINE > > status: One or more devices has experienced an error resulting in data > > corruption. Applications may be affected. > > action: Restore the file in question if possible. Otherwise restore the > > entire pool from backup. > >see: http://illumos.org/msg/ZFS-8000-8A > > scan: none requested > > config: > > > > NAME STATE READ WRITE CKSUM > > userdata ONLINE 0 0 216K > > mirror-0 ONLINE 0 0 432K > > gpt/userdata0 ONLINE 0 0 432K > > gpt/userdata1 ONLINE 0 0 432K > That would be funny, if not that sad, but while writing this message, > the pool started to look like below (I just asked zpool status twice in > a row, comparing to what it was): > > [root@san1:~]# zpool status userdata >pool: userdata > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://illumos.org/msg/ZFS-8000-8A >scan: none requested > config: > > NAME STATE READ WRITE CKSUM > userdata ONLINE 0 0 728K >mirror-0 ONLINE 0 0 1,42M > gpt/userdata0 ONLINE 0 0 1,42M > gpt/userdata1 ONLINE 0 0 1,42M > > errors: 4 data errors, use '-v' for a list > [root@san1:~]# zpool status userdata >pool: userdata > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://illumos.org/msg/ZFS-8000-8A >scan: none requested > config: > > NAME STATE READ WRITE CKSUM > userdata ONLINE 0 0 730K >mirror-0 ONLINE 0 0 1,43M > gpt/userdata0 ONLINE 0 0 1,43M > gpt/userdata1 ONLINE 0 0 1,43M > > errors: 4 data errors, use '-v' for a list > > So, you see, the error rate is like speed of light. And I'm not sure if > the data access rate is that enormous, looks like they are increasing on > their own. > So may be someone have an idea on what this really means. Quoting a comment from sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c: /* * If destroy encounters an EIO while reading metadata (e.g. indirect * blocks), space referenced by the missing metadata can not be freed. * Normally this causes the background destroy to become "stalled", as * it is unable to make forward progress. While in this stalled state, * all remaining space to free from the error-encountering filesystem is * "temporarily leaked". Set this flag to cause it to ignore the EIO, * permanently leak the space from indirect blocks that can not be read, * and continue to free everything else that it can. * * The default, "stalling" behavior is useful if the storage partially * fails (i.e. some but not all i/os fail), and then later recovers. In * this case, we will be able to continue pool operations while it is * partially failed, and when it recovers, we can continue to free the * space, with no leaks. However, note that this case is actually * fairly rare. * * Typically pools either (a) fail completely (but perhaps temporarily, * e.g. a top-level vdev going offline), or (b) have localized, * permanent errors (e.g. disk returns the wrong data due to bit flip or * firmware bug). In case (a), this setting does not matter because the * pool will be suspended and the sync thread will not be able to make * forward progress regardless. In case (b), because the error is * permanent, the best we can do is leak the minimum amount of space, * which is what setting this flag will do. Therefore, it is reasonable * for this flag to normally be set, but we chose the more conservative * approach of not setting it, so that there is no possibility of * leaking space in the "partial temporary" failure case. */ In FreeBSD the "flag" currently isn't easily reachable due to the lack of a powerful kernel debugger (like mdb in Solaris offsprings) but it can be made reachable with a sysctl using the patch from: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=218954 Fabian pgpPIvlIrBIcn.pgp Description: OpenPGP digital signature
Re: zpool imported twice with different names (was Re: Fwd: ZFS)
Nikos Vassiliadiswrote: > On 05/15/2017 08:09 PM, Nikos Vassiliadis wrote: > > Hi everybody, > > > > While trying to rename a zpool from zroot to vega, > > I ended up in this strange situation: > > nik@vega:~ % zfs list -t all > > NAME USED AVAIL REFER MOUNTPOINT > > vega1.83G 34.7G96K /zroot > > vega/ROOT 1.24G 34.7G96K none > > vega/ROOT/default 1.24G 34.7G 1.24G / > > vega/tmp 120K 34.7G 120K /tmp > > vega/usr 608M 34.7G96K /usr > > vega/usr/home136K 34.7G 136K /usr/home > > vega/usr/ports96K 34.7G96K /usr/ports > > vega/usr/src 607M 34.7G 607M /usr/src > > vega/var 720K 34.7G96K /var > > vega/var/audit96K 34.7G96K /var/audit > > vega/var/crash96K 34.7G96K /var/crash > > vega/var/log 236K 34.7G 236K /var/log > > vega/var/mail100K 34.7G 100K /var/mail > > vega/var/tmp 96K 34.7G96K /var/tmp > > zroot 1.83G 34.7G96K /zroot > > zroot/ROOT 1.24G 34.7G96K none > > zroot/ROOT/default 1.24G 34.7G 1.24G / > > zroot/tmp120K 34.7G 120K /tmp > > zroot/usr608M 34.7G96K /usr > > zroot/usr/home 136K 34.7G 136K /usr/home > > zroot/usr/ports 96K 34.7G96K /usr/ports > > zroot/usr/src607M 34.7G 607M /usr/src > > zroot/var724K 34.7G96K /var > > zroot/var/audit 96K 34.7G96K /var/audit > > zroot/var/crash 96K 34.7G96K /var/crash > > zroot/var/log240K 34.7G 240K /var/log > > zroot/var/mail 100K 34.7G 100K /var/mail > > zroot/var/tmp 96K 34.7G96K /var/tmp > > nik@vega:~ % zpool status > >pool: vega > > state: ONLINE > >scan: scrub repaired 0 in 0h0m with 0 errors on Mon May 15 01:28:48 > > 2017 config: > > > > NAMESTATE READ WRITE CKSUM > > vegaONLINE 0 0 0 > >vtbd0p3 ONLINE 0 0 0 > > > > errors: No known data errors > > > >pool: zroot > > state: ONLINE > >scan: scrub repaired 0 in 0h0m with 0 errors on Mon May 15 01:28:48 > > 2017 config: > > > > NAMESTATE READ WRITE CKSUM > > zroot ONLINE 0 0 0 > >vtbd0p3 ONLINE 0 0 0 > > > > errors: No known data errors > > nik@vega:~ % > > --- > > > > It seems like there are two pools, sharing the same vdev... > > > > After running a few commands in this state, like doing a scrub, > > the pool was (most probably) destroyed. It couldn't boot anymore > > and I didn't research further. Is this a known bug? > > > > Steps to reproduce: > >install FreeBSD-11.0 in a pool named zroot > >reboot into a live-CD > >zpool import -f zroot vega Why did you use the -f flag? Unless you can reproduce the problem without it, it's not obvious to me that this is a bug. Fabian pgpkzqhGt3yYe.pgp Description: OpenPGP digital signature
Re: moutnroot failing on zpools in Azure after upgrade from 10 to 11 due to lack of waiting for da0
Pete Frenchwrote: > I have a number of machines in Azure, all booting from ZFS and, until > the weekend, running 10.3 perfectly happily. > > I started upgrading these to 11. The first went fine, the second would > not boot. Looking at the boot diagnistics it is having problems finding > the root pool to mount. I see this is the diagnostic output: > > storvsc0: on vmbus0 > Solaris: NOTICE: Cannot find the pool label for 'rpool' > Mounting from zfs:rpool/ROOT/default failed with error 5. > Root mount waiting for: storvsc > (probe0:blkvsc0:0:storvsc1: 0: Interface>0): on vmbus0 storvsc scsi_status = 2 > (da0:blkvsc0:0:0:0): UNMAPPED > (probe1:blkvsc1:0:1:0): storvsc scsi_status = 2 > hvheartbeat0: on vmbus0 > da0 at blkvsc0 bus 0 scbus2 target 0 lun 0 > > As you can see, the drive da0 only appears after it has tried, and > failed, to mount the root pool. > > Normally I would just stick in a big 'vfs.mountroot.timeout' but that > variable doesnt not appear to exist under 11 - or at least it doesnt > show up in sysctl. The variable still exists but is ignored when using ZFS. It's a known issue. You could try this patch: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=208882#c3 Manually specifying the root pool should workaround the issue. sysctl(8) does not show the variable as it's only a tunable. This is unrelated to the update. Fabian pgpwEHDYyDr4C.pgp Description: OpenPGP digital signature
Re: Swapping from a zvol results in a deadman panic
"Matthew X. Economou"wrote: > My FreeBSD 10.3-RELEASE-p16 server crashes in the middle of a Poudriere > bulk run (see below). This crash happens even if I lower > vfs.zfs.arc_max or tweak vm.v_free_min/target/reserved/severe. I'm > looking for configuration advice in case I missed something obvious, > since this seems to work on Illumos- and Linux-derived O/Ses, but > failing that, I'd like to get some advice as to how to go about > debugging this. I doubt the deadman timer causes the system to stop > responding. It's more likely a race condition elsewhere. > > The pool itself uses 4k sectors and is geli-encrypted. I configured the > swap zvol based on root-on-ZFS install instructions found in the FreeBSD > wiki: Paging on geli-encrypted devices is known to cause deadlocks on FreeBSD, even if ZFS isn't involved directly: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209759 Adding ZFS to the mix is unlikely to help ... > zfs create -V 6G -o org.freebsd:swap=on -o checksum=off -o > compression=off -o dedup=off -o sync=disabled -o primarycache=none > zroot/swap > > The ZoL wiki recommends a slightly different zvol configuration: > > zfs create -V 4G -b $(getconf PAGESIZE) -o logbias=throughput -o > sync=always -o primarycache=metadata -o com.sun:auto-snapshot=false > rpool/swap > > I'm not sure how much of this applies to FreeBSD due to differences in > kernel design/implementation. Does anyone have an idea of what might be > going on and how I might get this working? You could try the patch from the PR and enable the kern.geom.eli.use_uma_for_all_writes sysctl. If you have a core dump, you may want to confirm that the g_eli_worker is waiting for memory first. Fabian pgp_dI3TEXEPz.pgp Description: OpenPGP digital signature
Poor ZFS ARC metadata hit/miss stats after recent ZFS updates
After rebasing some of my systems from r305866 to r307312 (plus local patches) I noticed that most of the ARC accesses are counted as misses now. Example: [fk@elektrobier2 ~]$ uptime 2:03PM up 1 day, 18:36, 7 users, load averages: 0.29, 0.36, 0.30 [fk@elektrobier2 ~]$ zfs-stats -E ZFS Subsystem ReportMon Oct 17 14:03:58 2016 ARC Efficiency: 3.38m Cache Hit Ratio:12.87% 435.23k Cache Miss Ratio: 87.13% 2.95m Actual Hit Ratio: 9.55% 323.15k Data Demand Efficiency: 6.61% 863.01k CACHE HITS BY CACHE LIST: Most Recently Used: 18.97% 82.54k Most Frequently Used: 55.28% 240.60k Most Recently Used Ghost: 8.88% 38.63k Most Frequently Used Ghost: 24.84% 108.12k CACHE HITS BY DATA TYPE: Demand Data: 13.10% 57.03k Prefetch Data:0.00% 0 Demand Metadata: 32.94% 143.36k Prefetch Metadata:53.96% 234.85k CACHE MISSES BY DATA TYPE: Demand Data: 27.35% 805.98k Prefetch Data:0.00% 0 Demand Metadata: 71.21% 2.10m Prefetch Metadata:1.44% 42.48k I suspect that this is caused by r307265 ("MFC r305323: MFV r302991: 6950 ARC should cache compressed data") which removed a ARCSTAT_CONDSTAT() call but I haven't confirmed this yet. The system performance doesn't actually seem to be negatively affected and repeated metadata accesses that are counted as misses are still served from memory. On my freshly booted laptop I get: fk@t520 /usr/ports $for i in 1 2 3; do \ /usr/local/etc/munin/plugins/zfs-absolute-arc-hits-and-misses; \ time git status > /dev/null; \ done; \ /usr/local/etc/munin/plugins/zfs-absolute-arc-hits-and-misses; zfs_arc_hits.value 5758 zfs_arc_misses.value 275416 zfs_arc_demand_metadata_hits.value 4331 zfs_arc_demand_metadata_misses.value 270252 zfs_arc_demand_data_hits.value 304 zfs_arc_demand_data_misses.value 3345 zfs_arc_prefetch_metadata_hits.value 1103 zfs_arc_prefetch_metadata_misses.value 1489 zfs_arc_prefetch_data_hits.value 20 zfs_arc_prefetch_data_misses.value 334 real1m23.398s user0m0.974s sys 0m12.273s zfs_arc_hits.value 11346 zfs_arc_misses.value 389748 zfs_arc_demand_metadata_hits.value 7723 zfs_arc_demand_metadata_misses.value 381018 zfs_arc_demand_data_hits.value 400 zfs_arc_demand_data_misses.value 3412 zfs_arc_prefetch_metadata_hits.value 3202 zfs_arc_prefetch_metadata_misses.value 4885 zfs_arc_prefetch_data_hits.value 21 zfs_arc_prefetch_data_misses.value 437 real0m1.472s user0m0.452s sys 0m1.820s zfs_arc_hits.value 11348 zfs_arc_misses.value 428536 zfs_arc_demand_metadata_hits.value 7723 zfs_arc_demand_metadata_misses.value 419782 zfs_arc_demand_data_hits.value 400 zfs_arc_demand_data_misses.value 3436 zfs_arc_prefetch_metadata_hits.value 3204 zfs_arc_prefetch_metadata_misses.value 4885 zfs_arc_prefetch_data_hits.value 21 zfs_arc_prefetch_data_misses.value 437 real0m1.537s user0m0.461s sys 0m1.860s zfs_arc_hits.value 11352 zfs_arc_misses.value 467334 zfs_arc_demand_metadata_hits.value 7723 zfs_arc_demand_metadata_misses.value 458556 zfs_arc_demand_data_hits.value 400 zfs_arc_demand_data_misses.value 3460 zfs_arc_prefetch_metadata_hits.value 3208 zfs_arc_prefetch_metadata_misses.value 4885 zfs_arc_prefetch_data_hits.value 21 zfs_arc_prefetch_data_misses.value 437 Disabling ARC compression through vfs.zfs.compressed_arc_enabled does not affect the accounting issue. Can anybody reproduce this? Fabian pgpVFcIp4qm9F.pgp Description: OpenPGP digital signature
Re: WLANDEV of vaps
Matthias Meyserwrote: > ist there a way to get the correspondig wlandev of an existing wlan? > > e.g. > > I have one urtwn0 an one run0 an one configured wlan0. > > How do i know where wlan0 belongs to? Try: sysctl net.wlan.0.%parent Fabian pgp5linSIydgJ.pgp Description: OpenPGP digital signature
Re: Periodic jobs triggering panics in 10.1 and 10.2
Michelle Sullivanwrote: > ZFS has it's place, it is very good at some things, it brings features > that people need. > ZFS does not work (is not stable) on i386 without recompiling the > kernel, but it is presented as an installation option. > ZFS is compiled in by default in i386 kernels without the necessary > option change to make it "stable". > We have been told the kernel option change will never be put there by > default. FYI, the stack overflows should be addressed by: https://svnweb.freebsd.org/base?view=revision=r286288 Fabian pgpBJ6TnkYrx5.pgp Description: OpenPGP digital signature
Re: 10.2-RELEASE-p2 lost ability to bootstrap pkg with signature_type="pubkey"
Marko Cupaćwrote: > I just found out that 10.2-RELEASE-p2 lost ability to bootstrap pkg > with signature_type="pubkey". > > Quick search returns: > https://github.com/freebsd/pkg/issues/1309 > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=202622 > > I guess it is not hard to switch repo to fingerprints, however I would > not expect to lose this functionality by updating to patchlevel. The "functionality" pkg(7) "lost" is silently ignoring unsupported signature types which is dangerous if the network can't be trusted: https://www.freebsd.org/security/advisories/FreeBSD-EN-15:15.pkg.asc https://www.fabiankeil.de/gehacktes/hardenedbsd/ If you absolutely want to, you can still bootstrap insecurely by temporarily setting the signature type to none. Fabian pgpyIlNTJXyH2.pgp Description: OpenPGP digital signature
Re: New FreeBSD snapshots available: stable/10 (20150625 r284813)
Chris Ross cross+free...@distal.com wrote: Yeah, this is the same panic you, I, and others have been seeing on sparc64's with bge's, or at least v240's (and one other IIRC) for many many months. Thanks for grabbing a core! Does it make a difference if you boot with hw.bge.allow_asf=0? According to the man page it is known to cause system lockup problems on a small number of systems. It's not obvious to me why it's enabled by default on FreeBSD and I disable it on all my systems. Fabian pgp9Wpk3XRKvH.pgp Description: OpenPGP digital signature
Re: New FreeBSD snapshots available: stable/10 (20150625 r284813)
Kurt Lidl l...@pix.net wrote: [-stable@ in CC since these are the first 10.2-PRERELEASE builds available since the code slush went into effect, which marks the start of the release cycle.] New FreeBSD development branch installation ISOs and virtual machine disk images have been uploaded to the FTP mirrors. As with any development branch, the installation snapshots are not intended for use on production systems. We do, however, encourage testing on non-production systems as much as possible. I was able to download the sparc64 iso image, burn the iso to a cd-rom, and boot a sparc64 V120 from that image. I was also able to perform an install onto a ZFS only setup, and have it work properly. On i386, the ZFS-only installation reproducible works after the first reboot but after the first reboot panics while importing the root pool. The problem seems to be that the GENERIC kernel is build with clang but KSTACK_PAGES has not been adjusted according to UPDATING: | 20121223: |After switching to Clang as the default compiler some users of ZFS |on i386 systems started to experience stack overflow kernel panics. |Please consider using 'options KSTACK_PAGES=4' in such configurations. If the issue can't be addressed before the release it may be worth mentioning it in the release notes. Fabian pgpC7ZdQNGlTL.pgp Description: OpenPGP digital signature
Re: New FreeBSD snapshots available: stable/10 (20150625 r284813)
Glen Barber g...@freebsd.org wrote: [-stable@ in CC since these are the first 10.2-PRERELEASE builds available since the code slush went into effect, which marks the start of the release cycle.] New FreeBSD development branch installation ISOs and virtual machine disk images have been uploaded to the FTP mirrors. As with any development branch, the installation snapshots are not intended for use on production systems. We do, however, encourage testing on non-production systems as much as possible. ggatec and ggatel are still broken on i386: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197309 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199559 If the ZFS root pools isn't found right away, the system deadlocks: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=198563 Patches are available so it would be great if these issues could be fixed before the release. Fabian pgpQ22L1TKtmV.pgp Description: OpenPGP digital signature
Re: patch which implements ZFS LZ4 compression
Jeremy Chadwick j...@koitsu.org wrote: On Sat, Feb 09, 2013 at 03:19:18PM +0100, Fabian Keil wrote: Jeremy Chadwick j...@koitsu.org wrote: If you want a PR for it, I'll file one, but all it's going to contain is the contents of this Email. My impression is that your emails describe symptoms and contain some speculation about what the cause might be. I didn't see any sched traces, so it's unclear (to me) that priorities are actual the problem. They contain no speculation. Bob Friesenhahn, who has a lot of experience and familiarity with ZFS on Solaris, seemed to know exactly the behaviour I described. Others on FreeBSD have reported the same behaviour as well, just not in that thread circa 2011. Similar symptoms can have different causes. Regarding sched traces, please expand and include instructions. I'm referring to the stuff that is fed into: /usr/src/tools/sched/schedgraph.py It can be created with ktrace and dtrace and I believe the documentation is buried in the various the scheduler sucks threads. It's also unclear to me why the dedup and compression issues should be related. There are lots of dedup performance issues reported for Solaris as well and I doubt that they can be fixed for FreeBSD without significantly deviating from the ZFS upstream. What part of Bob's statement did you not understand? Here, let me repeat it verbatim: Solaris solved the problem by putting the zfs writer threads into a special scheduling class so that they are usually lower priority than normal processing. Before this change, a desktop system would become almost unusable (intermittent loss of keyboard/mouse) while writing lots of data with compression enabled. Some NFS servers encountered severe enough issues that NFS clients reported NFS timeouts. My impression from reading zfs-discuss@ is that dedup performance and some interactivity issues actually still exist in Illumos and that they are completely unrelated to zfs writer threads. As I can't use dedup on my systems I don't really pay attention to them, though. I'm not saying a PR would be useless, but in my experience PRs with insufficient information just stay open and if the problem isn't important enough for you to provide additional information filing a PR is unlikely to have a great impact: http://www.freebsd.org/cgi/query-pr-summary.cgi?category=text=zfs Then someone in the know needs to explain exactly *what* data would help and (more importantly) *how* to go about providing it (i.e. what to enable in the kernel, what commands to issue, etc.). Eidan has repeatedly insisted that PRs are a Good Thing(tm) because they allow for an official way to track issues vs. mailing list threads that start and turn into tumbleweeds (just like the one I've referenced). And how many of those PRs are actually solved? This is a rhetoric question and I don't expect you to look it up. I'm not saying that PRs are a bad thing, but filing PRs is the easy part and in my experience issues that don't spark developer interest on the mailing lists are usually also ignored when filed as PR, especially when they don't contain 100% of the information that may be relevant. Even if you provide proof that the priorities are indeed the cause of the problem there's a fair chance that the PR gets ignored anyway. I currently have four somewhat ZFS-related PRs open, the first was filed in 2007. I still don't think that the solution is that nobody works on ZFS improvements until my PRs are solved. I'm looking forward to using LZ4 which promises better compression than lzjb with less interactivity impact than gzip. It might even work for your /dev/random test as it's supposed to better deal with poorly compressible data. Without those necessary instructions, in effect what you're asking me to do is prove that the problem exists, which I have already done so. You just don't like the data I've provided. I don't expect you to prove that the problem exists. My impression is that the interactivity issues with gzip have been well known for years and exist since the ZFS import. I also don't dislike your data, all I'm saying is that there could be other explanations. Bottom line: people enable compression on an fs, issue large amounts of write I/O to that fs (say hundreds of megabytes, or gigabytes), and start to see the entire system intermittently stalling hard (for multiple seconds at a time). This affects everything from switching VTs on physical console to packets going across SSH. The stalls vary in duration depending on what compression type is used (lzjb vs. gzip-1 -- I cannot even imagine what gzip-9 would be like). I described it as verbosely as I could, including going back and re-testing because people felt the ZFSv28 import might have addressed it (it did not): http://lists.freebsd.org/pipermail/freebsd-fs/2011-October/012752.html I'm aware that the interactivity issues
Re: patch which implements ZFS LZ4 compression
Jeremy Chadwick j...@koitsu.org wrote: On Fri, Feb 08, 2013 at 02:52:57PM -0800, Xin Li wrote: On 02/08/13 14:29, Dan Langille wrote: Here is a patch against FreeBSD 9.1 STABLE which implements ZFS LZ4 compression. https://plus.google.com/106386350930626759085/posts/PLbkNfndPiM short link: http://bpaste.net/show/76095 Please DO NOT use this patch! It will ruin your data silently. As I already posted on Ivan's Google+ post, I'm doing final universe builds to make sure that there is no regression and will merge my changes to -HEAD later today. Another compression algorithm, this time 50%+ faster than lzjb. Great, fine, wonderful, awesome, kudos, huzzah, blah blah blah. So when is someone going to step up to the plate and fix how compression (as well as dedup) destroys interactivity on FreeBSD? Do I need to remind folks of this issue once again? Here you have it, dated October 2011, including the root cause and how it was fixed in Solaris et al: Description: http://lists.freebsd.org/pipermail/freebsd-fs/2011-October/012718.html Explanation and how Solaris et al fixed it, and how on Solaris the problem was major enough that it even caused NFS timeouts (sound familiar to anyone?): http://lists.freebsd.org/pipermail/freebsd-fs/2011-October/012726.html Further testing showing gzip-1 vs. lzjb and interactivity stalls: http://lists.freebsd.org/pipermail/freebsd-fs/2011-October/012752.html This is still a problem with base/stable/9. And as I have said elsewhere on lists, do not ask me to run CURRENT -- it will be a cold day in hell before I ever do that. I assume this same problem exists in CURRENT unless I have some key developer/committer say I backported this fix in CURRENT, absolutely 100% sure. I'm also wondering why iXSystems hasn't stepped up to the plate to contribute to making this happen, given their business focus. I do not have the knowledge of the kernel (or of threading) to fix this myself, and for that I do apologise. But every time I see compression or dedup mentioned, I use the opportunity to bring up this subject. STOP ADDING FEATURES AND FIX STUFF LIKE THIS INSTEAD -- while new algorithms are neat/fun toys, they do not truly fix issues like this. How this problem has continually gotten overlooked is beyond me. Did you consider that other people may have different priorities than you do? If you want a PR for it, I'll file one, but all it's going to contain is the contents of this Email. My impression is that your emails describe symptoms and contain some speculation about what the cause might be. I didn't see any sched traces, so it's unclear (to me) that priorities are actual the problem. It's also unclear to me why the dedup and compression issues should be related. There are lots of dedup performance issues reported for Solaris as well and I doubt that they can be fixed for FreeBSD without significantly deviating from the ZFS upstream. I'm not saying a PR would be useless, but in my experience PRs with insufficient information just stay open and if the problem isn't important enough for you to provide additional information filing a PR is unlikely to have a great impact: http://www.freebsd.org/cgi/query-pr-summary.cgi?category=text=zfs Fabian signature.asc Description: PGP signature
Re: how to destroy zfs parent filesystem without destroying children - corrupted file causing kernel panick
Greg Bonett greg.bon...@gmail.com wrote: My next plan would be reporting the problem with sufficient information so the bug can be fixed. Destroying the dataset or the whole pool seems like papering over the real issue to me and you could still do it if the PR gets ignored for too long or a developer agrees that this is the only option. ok, that's a good idea - do you know where I should report this problem? I'd start with freebsd-fs@ and file a proper PR if there's still no response after a few weeks or so. If you haven't already, you might want to skim through: http://www.freebsd.org/cgi/query-pr-summary.cgi?text=zfs first, to see if your problem is already known. unfortunately, I don't know how I can provide the problematic file because any read, cp, or mv causes kernel panic. Additional information about the panic itself will probably do at the beginning, you can always provide more later if someone asks for it. For details see: http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug.html Fabian signature.asc Description: PGP signature
Re: how to destroy zfs parent filesystem without destroying children - corrupted file causing kernel panick
Greg Bonett greg.bon...@gmail.com wrote: Many months ago, I believe some *very bad hardware* caused corruption of a file on one of my zfs file systems. I've isolated the corrupted file and can reliably induce a kernel panic with touch bad.file, rm bad.file, or ls -l in the bad.file's directory (ls in bad.file's dir doesn't cause panic, but ls bad.file does). This is a raidz zpool, but zpool scrub doesn't fix it - it eventually creates a kernel panic. My next plan is to attempt to get rid of this file by zfs destroy(ing) the entire filesystem. The corrupted file is on /tank, and I've copied all of the good data onto a new zfs file system, /tank/tempfs/. My next plan would be reporting the problem with sufficient information so the bug can be fixed. Destroying the dataset or the whole pool seems like papering over the real issue to me and you could still do it if the PR gets ignored for too long or a developer agrees that this is the only option. Fabian signature.asc Description: PGP signature
Re: geom using 100% cpu with failed da5. How to calm it down without cam passdev?
Harald Schmalzbauer h.schmalzba...@omnilan.de wrote: I've a failed disk at a remote server, which shouldn't be a problem actually. Welcome to geom ... Just for info, here's the last shout: kernel: (da5:mps0:0:5:0): SYNCHRONIZE CACHE(10). CDB: 35 0 0 0 0 0 0 0 0 0 length 0 SMID 256 command timeout cm 0xff8001c64800 ccb 0xfe0007329000 kernel: mps0: mpssas_alloc_tm freezing simq kernel: mps0: timedout cm 0xff8001c64800 allocated tm 0xff8001c50148 kernel: (da5:mps0:0:5:0): SYNCHRONIZE CACHE(10). CDB: 35 0 0 0 0 0 0 0 0 0 length 0 SMID 256 completed timedout cm 0xff8001c64800 ccb 0xfe0007329000 during recovery ioc 8048 scsi 0 state c xf(noperiph:mps0:0:5:0): SMID 1 abort TaskMID 256 status 0x4a code 0x0 count 1 kernel: (noperiph:mps0:0:5:0): SMID 1 finished recovery after aborting TaskMID 256 kernel: mps0: mpssas_free_tm releasing simq kernel: (da5:mps0:0:5:0): SYNCHRONIZE CACHE(10). CDB: 35 0 0 0 0 0 0 0 0 0 kernel: (da5:mps0:0:5:0): CAM status: Command timeout kernel: (da5:mps0:0:5:0): Retrying command kernel: (da5:mps0:0:5:0): TEST UNIT READY. CDB: 0 0 0 0 0 0 length 0 SMID 981 terminated ioc 804b scsi 0 state 0 xfer 0 kernel: mps0: mpssas_alloc_tm freezing simq kernel: mps0: mpssas_remove_complete on handle 0x000e, IOCStatus= 0x0 kernel: mps0: mpssas_free_tm releasing simq kernel: (da5:mps0:0:(pass7:5:mps0:0:0): lost device - 4 outstanding, 2 refs kernel: 5:0): passdevgonecb: devfs entry is gone kernel: (da5:mps0:0:5:0): oustanding 3 kernel: (da5:mps0:0:5:0): oustanding 2 kernel: (da5:mps0:0:5:0): oustanding 1 kernel: (da5:mps0:0:5:0): oustanding 0 After reboot, 'camcontrol devlist' doesn't show any da5, but 'geom disk list' _does_ show da5!!! My problem is that geom is now consuming 100% of one core! top -S: 13 root3 -8- 0K48K - 1 480:19 100.00% geom Since there's no /dev/da5 I can't use camcontrol to stop anything, and at the moment nobody can physically remove the failed drive. How can I calm geom down? I reported a similar problem in: http://www.freebsd.org/cgi/query-pr.cgi?pr=171865 The PR contains a patch that I'm using as a workaround. How can I find out what geom is doing/trying to do? I guess it's related to the failed da5, but how can I know? DTrace might help. Fabian signature.asc Description: PGP signature
Re: geli decrypt only one partition
joerg_surmann joerg_surm...@snafu.de wrote: Sorry, i no had enough time for this geli problem. I work with a testsystem. When start booting in verbose mode the system found the keypaths. Preloaded ada0p4:geli_keyfile0 /root/keys/ada0p4.key at 0xc14bf540. Preloaded ada1p4:geli_keyfile1 /root/keys/ada1p4.key at 0xc14bf598. loader.conf geom_eli_load=YES geli_ada0p4_keyfile0_load=YES geli_ada0p4_keyfile0_type=ada0p4:geli_keyfile0 geli_ada0p4_keyfile0_name=/root/keys/ada0p4.key geli_ada1p4_keyfile1_load=YES geli_ada1p4_keyfile1_type=ada1p4:geli_keyfile1 geli_ada1p4_keyfile1_name=/root/keys/ada1p4.key zfs_load=YES vfs.root.mountfrom=zfs:zroot on boottime i can decrypt ada0p4. for ada1p4 ... wrong key. i can decrypt ada1p4 later by hand with the keyfile like loader.conf. same situation. ada0p4 and ada1p4 are a zfs mirror. Like I already wrote before, the problem is most like that you named the first keyfile for the second provider keyfile1 instead of keyfile0. The keyfile numeration restarts for each provider and geli will not use keyfile1 if keyfile0 doesn't exist. I missed that the Preloaded ... messages are a bit misleading here as they only show that the loader lines are recognized and that the kernel read the files, not that geli does anything useful with them. If you increase kern.geom.eli.debug you'll probably see that /root/keys/ada0p4.key is used by geli while /root/keys/ada1p4.key isn't. Fabian signature.asc Description: PGP signature
Re: geli decrypt only one partition
joerg_surmann joerg_surm...@snafu.de wrote: i have two partitions: ada0p3.eli and ada1p3.eli on bootprocess i must type a passphrase for ada0p3 and have ada0p3.eli. next i type the passphrase for ada1p3 and i become: wrong key when the bootprocess is finish and i login and type geli attach -k /path to keyfile /dev/ada1p3 and i type the passphrase then i have ada1p3.eli. why can i decrypt only one partition on bootprocess? This is frequently the effect of an incorrectly specified keyfile in loader.conf. Do you get a boot message like the following for both keyfiles when booting in verbose mode? Jun 20 19:49:34 r500 kernel: Preloaded ada0s1d:geli_keyfile0 /boot/ad4s1d.key at 0x813951d0. Fabian signature.asc Description: PGP signature
Re: kern/157863: [geli] kbdmux prevents geli passwords from being entered properly on boot
Thomas Steen Rasmussen tho...@gibfest.dk wrote: Just to let everyone know that this is still an issue. I am trying to install FreeBSD 9.0 amd64 on a Lenovo X121e and I can't get it to accept the geli passphrase during boot. I've confirmed using kern.geom.eli.visible_passphrase=1 that the passphrase is correct, and the same passphrase is accepted when the system is booted up. I've tried disabling kbdmux in /boot/device.hints like the PR said, but that didn't help. I also tried disabling atkbd and atkbdc without any luck, infact I couldn't type anything at all when disabling those. If disabling kbdmux doesn't help, it sounds like a different issue to me. Any hints or suggestions to what I might try ? I have another 9-stable laptop that mounts a geli volume at boot, no idea why that one works and this new one doesn't. Are you using the password together with a keyfile? I've misconfigured the keyfile in loader.conf in the past, which results in the valid password not being accepted. Obviously the setup then magically works later on when the keyfile is specified correctly on the command line. If you aren't using keyfiles, you could try setting up an USB stick with geli, to confirm that the same media works on one laptop, but doesn't on the other. Fabian signature.asc Description: PGP signature
Re: FreeBSD root on a geli-encrypted ZFS pool
Matthew X. Economou xenop...@irtnog.org wrote: Fabian Keil writes: Anyway, it's a test without file system so the ZFS overhead isn't measured. I wasn't entirely clear about it, but my assumption was that the ZFS overhead might be big enough to make the difference between HMAC/MD5 and HMAC/SHA256 a lot less significant. Got it. That also makes sense. I'll put this on my to-test list. Great. I'm currently using sector sizes between 512 and 8192 so I'm not actually expecting technical problems, it's just not clear to me how much the sector size matters and if 4096 is actually the best value when using ZFS. The geli(8) manual page claims that larger sector sizes lower the overhead of GEOM_ELI keying initialization and encryption/decryption steps by requiring fewer of these compute-intensive setup operations per block. I think the setup operations per block should stay the same, but the total number of setup operations decrease if(f) increasing the sector size decreases the number of sectors required to write the data. That however should depend on the data and I don't see why increasing the sector size should always be an improvement. Geli can't read or write less than a sector, so if the workload is randomly reading or writing a few hundred bytes, a sector size of 512 bytes should be superior to a sector size of 4 kB. Probably a sector size of 4 kB is good for some workloads, but clearly it can't be the best for all, and it's not obvious to me that it's the best for most. Fabian signature.asc Description: PGP signature
Re: FreeBSD root on a geli-encrypted ZFS pool
xenophon\\+freebsd xenophon+free...@irtnog.org wrote: -Original Message- From: Fabian Keil [mailto:freebsd-lis...@fabiankeil.de] Sent: Wednesday, March 07, 2012 11:49 AM It's not clear to me why you enable geli integrity verification. Given that it is single-sector-based it seems inferior to ZFS's integrity checks in every way and could actually prevent ZFS from properly detecting (and depending on the pool layout correcting) checksum errors itself. My goal in encrypting/authenticating the storage media is to prevent unauthorized external data access or tampering. My assumption is that ZFS's integrity checks have more to do with maintaining metadata integrity in the event of certain hardware or software faults (e.g., operating system crashes, power outages) - that is to say, ZFS cannot tell if an attacker boots from a live CD, imports the zpool, fiddles with something, and reboots, whereas GEOM_ELI can if integrity checking is enabled (even if someone tampers with the encrypted data). If the ZFS pool is located on GEOM_ELI providers the attacker shouldn't be able to import it unless the passphrase and/or keyfile are already known. If the attacker tampers with the encrypted data used by the pool, ZFS should detect it, unless it's a replay attack in which case enabling GEOM_ELI's integrity checking wouldn't have helped you either. If the attacker only replays a couple of blocks, ZFS's integrity detection is likely to detect it for most blocks, while GEOM_ELI's integrity checking will not detect it for any block. In my opinion protecting ZFS's default checksums (which cover non-metadata as well) with GEOM_ELI is sufficient. I don't see what advantage additionally enabling GEOM_ELI's integrity verification offers. This does raise an interesting question that merits further testing: What happens if a physical sector goes bad, whether that's due to a system bus or controller I/O error, a physical problem with the media itself, or someone actively tampering with the encrypted storage? GEOM_ELI would probably return some error back to ZFS for that sector, which could cause the entire vdev to go offline but might just require scrubbing the zpool to fix. I'm also wondering if you actually benchmarked the difference between HMAC/MD5 and HMAC/SHA256. Unless the difference can be easily measured, I'd probably stick with the recommendation. I based my choice of HMAC algorithm on the following forum post: http://forums.freebsd.org/showthread.php?t=12955 I'm wondering if dd's block size is correct, 4096 seems rather small. Anyway, it's a test without file system so the ZFS overhead isn't measured. I wasn't entirely clear about it, but my assumption was that the ZFS overhead might be big enough to make the difference between HMAC/MD5 and HMAC/SHA256 a lot less significant. I wouldn't recommend anyone use MD5 in real-world applications, either, so I'll update my instructions to use HMAC/SHA256 as recommended by geli(8). It's still not clear to me why you recommend using a HMAC for geli at all. I would also be interested in benchmarks that show that geli(8)'s recommendation to increase geli's block size to 4096 bytes makes sense for ZFS. Is anyone aware of any? As far as I know, ZFS on FreeBSD has no issues with 4k-sector drives, see Ivan Voras' comments here: http://ivoras.net/blog/tree/2011-01-01.freebsd-on-4k-sector-drives.html Double-checking my zpool shows the correct value for ashift: masip205bsdfile# zdb -C tank | grep ashift ashift: 12 I'm currently using sector sizes between 512 and 8192 so I'm not actually expecting technical problems, it's just not clear to me how much the sector size matters and if 4096 is actually the best value when using ZFS. Benchmarking different geli sector sizes would also be interesting and worth incorporating into these instructions. I'll add that to my to-do list as well. Great. Fabian signature.asc Description: PGP signature
Re: FreeBSD root on a geli-encrypted ZFS pool
xenophon\\+freebsd xenophon+free...@irtnog.org wrote: I have posted revised instructions for installing FreeBSD to an encrypted ZFS pool on my blog: https://web.irtnog.org/~xenophon/blog/revised-freebsd-root-zfs-geli The entire procedure is documented in a way suitable for scripting. I would be very interested in the community's feedback. It's not clear to me why you enable geli integrity verification. Given that it is single-sector-based it seems inferior to ZFS's integrity checks in every way and could actually prevent ZFS from properly detecting (and depending on the pool layout correcting) checksum errors itself. I'm also wondering if you actually benchmarked the difference between HMAC/MD5 and HMAC/SHA256. Unless the difference can be easily measured, I'd probably stick with the recommendation. I would also be interested in benchmarks that show that geli(8)'s recommendation to increase geli's block size to 4096 bytes makes sense for ZFS. Is anyone aware of any? Fabian signature.asc Description: PGP signature
Re: sysutils/pftop on 9.x+
Greg Rivers gcr+freebsd-sta...@tharned.org wrote: sysutils/pftop was marked broken on 9.x and above last March[1]. Are there any plans to fix it soon? It's a really handy utility. [1] http://www.freebsd.org/cgi/cvsweb.cgi/ports/sysutils/pftop/Makefile?rev=1.17 Please have a look at: http://www.freebsd.org/cgi/query-pr.cgi?pr=155938 Note that the currently working fix is in the audit trail, the original fix stopped working after the PF update. Fabian signature.asc Description: PGP signature
Re: Setting coredumpsize on a running process?
Ivan Voras ivo...@freebsd.org wrote: On 18 October 2011 16:43, Jeremy Chadwick free...@jdc.parodius.com wrote: On Tue, Oct 18, 2011 at 04:32:11PM +0200, Ivan Voras wrote: I have PHP executing as fastcgi via the mod_fcgid module in Apache. I suspect there is a bug in PHP or one of its extensions which causes it to crash with sigsegv, but I cannot get any coredumps. I suspect something is setting coredumpsize to 0 - either Apache, mod_fcgid or PHP. So the question is: is there a way to set coredumpsize on a running process, with the intention of getting a core dump when it crashes? I already tried setting CoreDumpDirectory in Apache and also configuring apache22limits_args in /etc/rc.conf but without effect. I ended up solving this on a machine where coredumps with Apache + PHP were highly common by setting sysctl kern.corefile to /var/cores/%P.%N.core, then made sure the /var/cores directory was root:wheel, perms 1777. Otherwise I could not get a coredump. apache22limits_enable did not help either, nor did CoreDumpDirectory. Having fun yet? Oh, I have years and years of fun debugging PHP, in one way or the other :) Your suggestion for setting core dump directory explicitely helped; now it looks like I've hit an infinite recursion / stack eating bug somewhere in PCRE... #1703 0x000805d5c72e in match () from /usr/local/lib/libpcre.so.0 #1704 0x000805d5b4f0 in match () from /usr/local/lib/libpcre.so.0 #1705 0x000805d5c72e in match () from /usr/local/lib/libpcre.so.0 #1706 0x000805d5b4f0 in match () from /usr/local/lib/libpcre.so.0 However, I'm drawing the line at debugging PCRE, this will go into the don't do that category. There's a fair chance that this isn't a bug in pcre, but the result of a poorly written expression. You may want to have a look at pcrestack(3). Fabian signature.asc Description: PGP signature
Re: geli problems after installkernel installworld
Christopher J. Ruwe c...@cruwe.de wrote: On Sat, 15 Jan 2011 22:30:56 +0100 Pawel Jakub Dawidek p...@freebsd.org wrote: On Thu, Jan 13, 2011 at 10:00:19PM +0100, Christopher J. Ruwe wrote: I use a mostly geli encrypted hd on my Thinkpad R500, with /compat, /usr, /tmp and /var all on the encrypted geli provider. After an upgrade of kernel and world (STABLE), I experience a weird issue: While booting, I am asked for the geli passphrase as usual. Completing password authentication for geli returns a success message, cryptosoft0: software crypto on motherboard GEOM_ELI: Device ada0p3.eli created. GEOM_ELI: Encryption: AES-CBC 256 GEOM_ELI: Crypto: software however, the zpool on geli is unavailable. Logging in a root, I can attach the geli provider manually as geli itself should do from /etc/rc.conf. After a successful zfs mount -a, I can resume as usual after manually starting the /usr/local/rc.d services. Neither have I noticed a change in the device names nor any unusual messages from dmesg. Currently, I am doing a new compile run on world and kernel to attempt anew tomorrow. Am I missing something? Can you show the output of 'geli list' from a running system? Sure I can ... I'll additionally comment the output with what I do to. First I boot and my /usr/local/rc.d/ - schripts do not start. Likewise does zsh. From doing geli list, I get (on stdout) Geom name: ada0p3.eli State: ACTIVE EncryptionAlgorithm: AES-CBC KeyLength: 256 Crypto: software UsedKey: 0 Flags: SINGLE-KEY, NATIVE-BYTE-ORDER, BOOT, RW-DETACH Providers: 1. Name: ada0p3.eli Mediasize: 249656594432 (233G) Sectorsize: 4096 Mode: r0w0e0 Consumers: 1. Name: ada0p3 Mediasize: 249656596992 (233G) Sectorsize: 512 Mode: r1w1e1 Doing a zpool status -v gives on stdout pool: ntank state: UNAVAIL status: One or more devices could not be opened. There are insufficient replicas for the pool to continue functioning. action: Attach the missing device and online it using 'zpool online'. see: http://www.sun.com/msg/ZFS-8000-3C scrub: none requested config: NAME STATE READ WRITE CKSUM ntank UNAVAIL 0 0 0 insufficient replicas ada0p3.eli UNAVAIL 0 0 0 cannot open pool: rpool state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on older software versions. scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 gptid/3ab00705-d22f-11df-8e1b-002713b40a7b ONLINE 0 0 0 errors: No known data errors and on stderr ( I noticed the output on stderr as I ran the command, so I just typed that) GEOM_ELI[1]: Device ada0p3.eli is still open, so it cannot be definitely removed. GEOM_ELI[1]: Detached ada0p3.eli on last close. When doing a geli attach -k /pathtomykey/key /dev/ada0p3 directly followed by a zfs mount -a, I have my filesystems where I am used to finding them. I run my /usr/local/rc.ds from there and am functional again. Then (I post this anwe, I will point out why later on), I get for geli list Geom name: ada0p3.eli State: ACTIVE EncryptionAlgorithm: AES-CBC KeyLength: 256 Crypto: software UsedKey: 0 Flags: SINGLE-KEY, NATIVE-BYTE-ORDER, BOOT Providers: 1. Name: ada0p3.eli Mediasize: 249656594432 (233G) Sectorsize: 4096 Mode: r1w1e1 Consumers: 1. Name: ada0p3 Mediasize: 249656596992 (233G) Sectorsize: 512 Mode: r1w1e1 I never noticed that before, but, as I did not know which geli output you were asking for (the one not working or the one working), I diffed the two files and noticed, that directly after booting, the RW-DETACH flag is set. I do not know what that means nor do I know whether that matters, I find that curious, though. I'm no sure if it's the cause of your problem, but it certainly does matter: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/117158 Fabian signature.asc Description: PGP signature
Re: ATA_CAM + ZFS gives short 1-2 seconds system freeze on disk load
Jeremy Chadwick free...@jdc.parodius.com wrote: On Mon, Feb 08, 2010 at 03:33:29PM +0100, Guido Falsi wrote: I'm seeing this problem on my machine at work. It's an HP DC 7800, mounts an ich9 chipset(not ahci capable). I'm attaching the dmesg. I noticed this in the past, but it got evident(and very annoying) while recompiling many ports today after the jpeg-8 update. It looks like it freezes the system for the second or two it takes to flush buffers to disk when there are big outputs. This happens when decompressiong big distfiles, mainly. The openoffice port triggers this almost continuosly every few seconds during compilation. I've also seen this when working with big files(for example graphic images in uncompressed formats). It gets very annoying and I don't remember this happening before activating the ATA_CAM flag. There was some slowdown with big disk access, but not a total freeze. This happens without ATA_CAM (e.g. using ataahci(4) or any other controller driver). Indeed. The behaviour you're describing (bursty heavy disk I/O that stalls the subsystem) is pretty much the norm on all FreeBSD systems I've seen with ZFS. When it starts happening, it's easy to notice/follow using zpool iostat 1 or gstat -I500ms. Lots of I/O will happen (read or write) and the ARC is essentially being thrashed -- said utilities won't show any I/O counters incrementing until some threshold is reached, where you'll see a massive amount of I/O reported, during which time the system is sluggish (beyond acceptable levels, IMHO). A few seconds later, the I/O counters start reporting 0 as the ARC gets used, then a few seconds massive I/O, rinse lather repeat. I experienced what I think is the same problem. ZFS's bulk disk flushes caused vlc to occasionally stutter when viewing a DVD rip from disk while ripping a DVD at the same time. My workaround is to put vfs.zfs.txg.timeout=3 in /boot/loader.conf. I think I read about this on zfs-disc...@. I assume on faster systems one can use a higher value. I'm currently updating the jpeg dependencies, too: f...@r500 ~ $zpool iostat 1 capacity operationsbandwidth pool used avail read write read write -- - - - - - - tank 176G 52.1G 22 40 1.40M 1.85M tank 176G 52.1G 73 0 9.24M 0 tank 176G 52.1G 73 0 9.05M 0 tank 176G 52.1G 42176 5.12M 11.3M tank 176G 52.1G 68 0 8.62M 0 tank 176G 52.1G 67 0 8.43M 0 tank 176G 52.1G 57106 7.11M 9.54M tank 176G 52.1G 75 0 9.50M 0 tank 176G 52.1G 76 0 9.62M 0 tank 176G 52.1G 46167 5.74M 11.7M tank 176G 52.1G 79 0 9.99M 0 tank 176G 52.1G 81 0 10.2M 0 tank 176G 52.1G 43164 5.43M 11.7M tank 176G 52.1G 71 0 9.00M 0 tank 176G 52.1G 61 39 7.74M 5.00M tank 176G 52.1G 46111 5.74M 9.17M tank 176G 52.1G 71 0 8.99M 0 tank 176G 52.1G 80 0 10.1M 0 tank 176G 52.1G 47113 5.87M 9.68M tank 176G 52.1G 70 0 8.87M 0 tank 176G 52.1G 78 0 9.80M 0 tank 176G 52.1G 42164 5.24M 11.3M tank 176G 52.1G 76 0 9.62M 0 tank 176G 52.1G 79 0 9.99M 0 tank 176G 52.1G 49153 6.11M 10.8M tank 176G 52.1G 72 0 9.12M 0 Fabian signature.asc Description: PGP signature
Re: ZFS MFC heads up
Pertti Kosunen pertti.kosu...@pp.nic.fi wrote: Kip Macy wrote: I will be MFC'ing the newer ZFS support some time this afternoon. Both world and kernel will need to be re-built. Existing pools will continue to work without upgrade. Mounting local file systems:. internal error: out of memory internal error: out of memory internal error: out of memory internal error: out of memory I get this in dmesg after make installkernel shutdown -r now, zfs pool is not mounted. /usr is on zfs so can't installworld. IIRC, that's what happens if ZFS kernel and userland aren't in sync. You'll either have to install the new kernel and userland together (not supported, but I do it all the time), or install the userland from a non-ZFS file system. Fabian signature.asc Description: PGP signature
Re: Panic in radeon_get_vblank_counter()
Robert Noland rnol...@freebsd.org wrote: On Fri, 2009-03-13 at 23:33 -0500, Sean C. Farley wrote: On Fri, 13 Mar 2009, Robert Noland wrote: If I start rebooting before it is printed, the system locks up. Of course, this is only after rebooting several times. Here is a successful start and shutdown: http://people.freebsd.org/~scf/drm-dmesg.log http://people.freebsd.org/~scf/Xorg.0.log Ok, I'll spend some time staring at the current code... Thanks for the backtrace too, it's nice to get those... This seems to be the same panic I mentioned in the Filesystems being eaten? thread on freebsd-current. I reproducible got this panic on: FreeBSD 8.0-CURRENT #39: Sat Mar 7 20:37:29 CET 2009 when shutting Xorg down. I can no longer reproduce it with: FreeBSD 8.0-CURRENT #42: Sat Mar 14 00:47:09 CET 2009 Fabian signature.asc Description: PGP signature
Re: non-root user can not create zfs filesystem?
Pete French [EMAIL PROTECTED] wrote: Yes,that's is what I want to say. In other word is the command zfs allow and zfs unallow I think it is not Support chflags(2) which is described in at the bottom of http://wiki.freebsd.org/ZFS Sorry, my unclear use of english! I didn't mean the last item, I meant that it was near the bottom of the page. Look at the line above the 'chflags' one - Delegated Administration is what you are after. Not here yet, but hopefully soon... You can already test it on CURRENT if you apply the patch Pawel posted on freebsd-fs@ and freebsd-current@ a while ago. Fabian signature.asc Description: PGP signature
Re: constant zfs data corruption
JoaoBR [EMAIL PROTECTED] wrote: On Monday 20 October 2008 11:22:08 you wrote: On Mon, Oct 20, 2008 at 08:37:40AM -0200, JoaoBR wrote: On Friday 17 October 2008 15:39:59 Chuck Swiger wrote: On Oct 17, 2008, at 11:30 AM, JoaoBR wrote: constantly I find data corruption on ZFS volums, ever from rrdtool, this corrupt data happens on SATA disks, never seem on SCSI Presumably your SATA drives are correctly being reported by ZFS as corrupting data, and you should do something like replace cables, the drives themselves, perhaps try downgrading to SATA-150 rather than -300 if you are using the later. Also consider running a drive diagnostic utility from the mfgr (or smartmontools) and doing an extended self-test or destructive write surface check. well, hardware seems to be ok and not older than 6 month, also happens not only on one machine ... smartctl do not report any hw failures on disk regarding jumpering the drives to 150 you suspect a driver problem? It's not because of a driver problem. There are known SATA chipsets which do not properly work with SATA300 (particularly VIA and SiS chipsets); they claim to support it, but data is occasionally corrupted. Capping the drive to SATA150 fixes this problem. http://en.wikipedia.org/wiki/Serial_ATA#SATA_1.5_Gbit.2Fs_and_SATA_3_Gbit.2 Fs There are also known problems with Silicon Image chipsets (on Linux, Windows, and FreeBSD). Because you didn't provide your smartctl output, I can't really tell if the drives are in good shape or not. :-) ok then here it comes smartctl version 5.38 [amd64-portbld-freebsd7.0] Copyright (C) 2002-8 Can you reproduce the problem on a i386 system? I have a USB HD case that works fine on a i386 system but writing from an amd64 system leads to ZFS checksum errors (reading works though). Fabian signature.asc Description: PGP signature
Re: constant zfs data corruption
Jeremy Chadwick [EMAIL PROTECTED] wrote: On Mon, Oct 20, 2008 at 03:07:30PM -0200, JoaoBR wrote: On Monday 20 October 2008 11:22:08 you wrote: Also, do you not think it's a little odd that the only data corruption occurring for you are related to RRDtool? this yes I think is suspitious Chuck's probably spot-on with regards to explaining why this is. Something to keep in mind is that RRDtool has a history of bugs, so I wouldn't be surprised if the issue turned out to be there. It's really too bad we have no decent, actively-maintained alternatives to RRDtool. Bugs in RRDtool shouldn't cause ZFS data corruption. Fabian signature.asc Description: PGP signature
Re: GELI encrypted ZFS zpool
Steve Bertrand [EMAIL PROTECTED] wrote: I have an older storage box that I've upgraded to -stable. It currently uses 7 SCSI disks mashed together with gstripe. I've recently replaced this box with a new one running a ZFS setup. I'm now wanting to turn the old one into a storage device running ZFS, but I want the entire pool encrypted with GELI. I know I can do this, but my requirements are as such: - use a key on external media to access the GELI encrypted disks - not have to type in the passphrase for each physical disk ...is this possible? It should be possible if you use keyfiles without password for the vdevs and store those keyfiles on a geli encrypted slice that uses both a keyfile and a passphrase. Fabian signature.asc Description: PGP signature
Re: possible zfs bug? lost all pools
JoaoBR [EMAIL PROTECTED] wrote: man page thar zfs can not be a dump device, not sure if I understand it as meant but I can dump to zfs very well and fast as long as recordsize=128 I assume you tried dump(8), while the sentence in the man page is about using a ZFS volume as dumpon(8) target: %sudo dumpon -v /dev/zvol/tank/swap dumpon: ioctl(DIOCSKERNELDUMP): Operation not supported Fabian signature.asc Description: PGP signature
Re: crash in acd_geom_detach() whilst reading vcd
Peter Jeremy [EMAIL PROTECTED] wrote: I was trying to play a VCD (using mplayer) on my 6-STABLE system and it runs for a while and then crashes. This is reproducable with the same traceback. kgdb reports: acd0: FAILURE - device detached Fatal trap 12: page fault while in kernel mode fault virtual address = 0x3c8 fault code = supervisor read data, page not present instruction pointer = 0x8:0x801b6489 stack pointer = 0x10:0xa3561ba0 frame pointer = 0x10:0xa3561bc0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 2 (g_event) trap number = 12 panic: page fault KDB: stack backtrace: panic() at panic+0x1c1 trap_fatal() at trap_fatal+0x298 trap_pfault() at trap_pfault+0x243 trap() at trap+0x298 calltrap() at calltrap+0x5 --- trap 0xc, rip = 0x801b6489, rsp = 0xa3561ba0, rbp = 0xa3561bc0 --- acd_geom_detach() at acd_geom_detach+0x19 g_run_events() at g_run_events+0x1b7 g_event_procbody() at g_event_procbody+0x5a fork_exit() at fork_exit+0x87 fork_trampoline() at fork_trampoline+0xe A gdb backtrace shows: #6 0x803787bb in calltrap () at /usr/src/sys/amd64/amd64/exception.S:168 #7 0x801b6489 in acd_geom_detach (arg=0xff7e1100, flag=0x0) at /usr/src/sys/dev/ata/atapi-cd.c:194 #8 0x8022f267 in g_run_events () at /usr/src/sys/geom/geom_event.c:209 #9 0x802305ca in g_event_procbody () at /usr/src/sys/geom/geom_kern.c:141 #10 0x80254f77 in fork_exit (callout=0x80230570 g_event_procbody, arg=0x0, frame=0xff0039dc4770) at /usr/src/sys/kern/kern_fork.c:821 #11 0x80378b1e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:394 The argument to acd_geom_detach() does include a NULL ivars: (kgdb) p *(device_t)0xff7e1100 $2 = { ops = 0xff825000, link = { tqe_next = 0xff7c1c00, tqe_prev = 0xff8ea130 }, devlink = { tqe_next = 0xff7c1c00, tqe_prev = 0xff9f1518 }, parent = 0xff8ea100, children = { tqh_first = 0x0, tqh_last = 0xff7e1130 }, driver = 0x80532220, devclass = 0xff7ebe00, unit = 0x0, nameunit = 0xff9d19d0 acd0, desc = 0xff0039bd72a0 TSSTcorpCD/DVDW TS-L532M/HR08, busy = 0x0, state = DS_ATTACHED, devflags = 0x0, flags = 0x5d, order = 0x0, pad = 0x0, ivars = 0x0, softc = 0xffacac00, sysctl_ctx = { tqh_first = 0xff0039bd7120, tqh_last = 0xff0039bd7228 }, sysctl_tree = 0xffb30600 } (kgdb) Is this behaviour expected? I think you're running into the same problem I reported in kern/99017: [ata] [patch] FreeBSD versions above 5.3 panic if atapi drives become unresponsive: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/99017 You could try the work-around, but your drive will probably still be lost until the next reboot. Fabian signature.asc Description: PGP signature
Re: release cycle
Chris [EMAIL PROTECTED] wrote: On 29/05/07, Mark Linimon [EMAIL PROTECTED] wrote: On Tue, May 29, 2007 at 09:17:57PM +1000, Peter Jeremy wrote: Agreed. 6.3-RELEASE would nominally be due around July but the lack of any schedule on http://www.freebsd.org/releng/ suggests that it will be later than that. The plans to start the 7.0-RELEASE cycle will also impact this. At BSDCan, Ken Smith mentioned that 7.0 is due to be branched in July and released in Aug/Sep, with 6.3 quickly following (perhaps even overlapping so as to reuse the same ports freeze). The ports tree is not even close to stable enough to release right now. Given that Kris repeatedly tells me and others that the ports system is only supported on the latest freebsd release (meaning one has to be upgrading freebsd on their servers every few months to get this support) if 7.0 and 6.3 are released around the same time will the ports tree be supported on both? I believe you misunderstood something. Where do you think Kris said that? Fabian signature.asc Description: PGP signature
Re: 6.2-RELEASE panic when blanking CD-RW media
Petr Holub [EMAIL PROTECTED] wrote: I've encountered a deterministic kernel panic when blanking one specific CD-RW media using cdrecord. The kernel panic details follow and dmesg is at the end of this email. Though I understand there's something wrong with the media, I think it shouldn't panic the kernel either. # kgdb /boot/kernel/kernel vmcore.9 (kgdb) bt #0 0xc067262e in doadump () #1 0xc0672afe in boot () #2 0xc0672d94 in panic () #3 0xc0885a04 in trap_fatal () #4 0xc088576b in trap_pfault () #5 0xc08853a9 in trap () #6 0xc0873a7a in calltrap () #7 0xc04e5e2e in acd_geom_detach () #8 0xc06388f9 in one_event () #9 0xc06389d1 in g_run_events () #10 0xc0639de5 in g_event_procbody () #11 0xc065cd34 in fork_exit () #12 0xc0873adc in fork_trampoline () (kgdb) x 0xc067262e 0xc04e5e2e acd_geom_detach+18: 0x03b0b0ff (kgdb) q You could give the following patch a try: http://www.fabiankeil.de/sourcecode/freebsd/atapi-cd.c.patch It prevents a panic if a disc drive gets lost without FreeBSD noticing it (for example because the firmware is buggy). See: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/99017 for details. Note that the patch simply prevents the panic, your drive will probably still be lost until you reboot. Fabian signature.asc Description: PGP signature
Re: Can't build threaded perl 5.8 on 6.2-RELEASE and 7-CURRENT
LI Xin [EMAIL PROTECTED] wrote: It seems that threaded perl is broken on 6.2-RELEASE and 7-CURRENT. I have tried some option combinations with no luck, if WITH_THREADED=yes is specified then the build would fail with a coredump. Any hints? I ran into the same miniperl core dumps a few days ago while trying to switch back to non-threaded Perl (shortly after updating the system to a recent RELENG_6). The only way I found to fix it was to: - deinstall all Perl ports, - rebuild Perl - reinstall all Perl ports I assume miniperl somehow included incompatible local Perl libraries, but I didn't really look into it. Fabian signature.asc Description: PGP signature
Re: Is there any good reason for get*by*_r()?
Mark Andrews [EMAIL PROTECTED] wrote: get*by*_r() are deprecated on most platforms and there use is highly non-portable, lots of different API's. Why are we adding compatability for deprecated functions? I was wondering the same thing, especially because it causes a lot of packages that where compiled on later FreeBSD 6.x version not to work on earlier FreeBSD 6.x versions. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: 16M RAM enough for FreeBSD 6.1?
Torfinn Ingolfsen [EMAIL PROTECTED] wrote: On Sun, 27 Aug 2006 18:13:12 +0200 Fabian Keil [EMAIL PROTECTED] wrote: For information: I'm still trying to find a sodimm card for this machine, as everything would be easier if it had more memory. We'll see how I manage that; here in Norway it is not so easy to find things like that, and transport costs from the US are prohibitive for a hobby budget. I moved the harddisk into a more powerful machine, installed FreeBSD there, build a lighter kernel and put the disk back. Are there any FAQ's arounf for things I can safely remove from a 6.1 kernel? I don't think so, but usually the comments are enough to decide if you need something or not. The man pages help with the rest. In your case it's probably easier to create a disk image in Qemu, copy it to a CD and then use something that Hmm, I'm not very familiar with Qemu. A quick web search didn't turn up any obvious pointers on how to create a ISO image from a qemu image, or how to make an ISO image from the (currently running) Qemu image. You can burn the Qemu image like every other file, you can even burn it directly without putting it into an ISO file first. You should stop Qemu first though, otherwise you might end up with an inconsistent image. If you only want to replace a partition, you can load the image with mdconfig to extract the partition you need. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: 16M RAM enough for FreeBSD 6.1?
Torfinn Ingolfsen [EMAIL PROTECTED] wrote: I have an old laptop, a Compaq Armada 1580DMT, with 16M RAM, 2GB hd, floppy and CD-rom. It doesn't have built in networking, neither wired nor wireless. It does have PC card slots. It has had FreeBSD 4.9-release installed a long time, and was recently upgraded to 4.11-release from CD, sucessfully. However, when I try the 6.1-release CD (CD1), it boots as far as loading the kernel, botting the kernel, and then reboots again?? Are 16 Megs of RAM to little to install FreeBSD 6.0 or newer? With the default configuration yes. I recently tried to install FreeBSD 6.1-PRERELEASE on a Pentium 90 with 16 MB RAM, and hit the rebooting problem as well. I moved the harddisk into a more powerful machine, installed FreeBSD there, build a lighter kernel and put the disk back. NFS mounting needed a work around: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/94830 but the rest worked out of the box. In your case it's probably easier to create a disk image in Qemu, copy it to a CD and then use something that boots from a floppy, supports the CD-Rom drive and brings dd with it, to install the image. Depending on your partition layout you may even be able to use your old FreeBSD installation to do that. (I'm not sure if it's possible to use FreeBSD to overwrite the partition it's running from). Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)
Fabian Keil [EMAIL PROTECTED] wrote: Fabian Keil [EMAIL PROTECTED] wrote: Peter Thoenen [EMAIL PROTECTED] wrote: To you have pf running? If so can you turn it off for a bit a see if you still crash. On my box I was getting all sorts of witness kbd backtraces on pf and since turning pf off (maybe a week ago), haven't crashed yet. Going to let it keep running unmetered for another 2 weeks and see if I crash or not. So far I didn't see a single PF related complaint from witness, but I'll try disabling PF in a few days anyway. It took a little longer than I thought, but I finally disabled PF today and switched to natd. Uptime was slightly above 25 hours. Compiling HEAD right now. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)
Fabian Keil [EMAIL PROTECTED] wrote: Peter Thoenen [EMAIL PROTECTED] wrote: To you have pf running? If so can you turn it off for a bit a see if you still crash. On my box I was getting all sorts of witness kbd backtraces on pf and since turning pf off (maybe a week ago), haven't crashed yet. Going to let it keep running unmetered for another 2 weeks and see if I crash or not. How is it going, Peter, still running? I'm running Tor jailed and use PF for NAT, port forwarding and filtering: http://tor.fabiankeil.de/pf-stats/ So far I didn't see a single PF related complaint from witness, but I'll try disabling PF in a few days anyway. It took a little longer than I thought, but I finally disabled PF today and switched to natd. At the moment I'm still testing if enabling polling really increases the uptime. I'm still not sure, however polling made it possible to use fxp0 without acpi, the hangs still occur and the serial console still becomes unresponsive though. On another wild guess I switched Tor's threading library from libpthread to libthr. While it doesn't seem to affect the uptime, it makes Tor's cpu usage visible in top, so maybe it would be a good default for tor-devel? Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)
Robert Watson [EMAIL PROTECTED] wrote: On Wed, 28 Jun 2006, Fabian Keil wrote: I just got: Jun 28 23:01:19 tor kernel: lock order reversal: Jun 28 23:01:19 tor kernel: 1st 0xc3795000 kqueue (kqueue) @ /usr/src/sys/kern/kern_event.c:1053 Jun 28 23:01:19 tor kernel: 2nd 0xc1043144 system map (system map) @ /usr/src/sys/vm/ Looks similar to http://sources.zabbadoz.net/freebsd/lor.html#185. Could you run vmstat -z, netstat -m, and vmstat -m please? I enabled polling three days ago and saw this lor two times since then. It may or may not be a coincidence. I log: top -S -d 2 pfctl -si netstat -ss sysctl -a vmstat -z netstat -m vmstat -m every five minutes, the output before and after the lor can be found at: http://www.fabiankeil.de/tmp/lor-185.txt The system is still up at the moment, so the lor might have nothing to do with the crashes/hangs/whatever. I have the feeling that polling does increase the uptime, but I'm not sure yet. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)
Fabian Keil [EMAIL PROTECTED] wrote: Robert Watson [EMAIL PROTECTED] wrote: On Wed, 28 Jun 2006, Fabian Keil wrote: I just got: Jun 28 23:01:19 tor kernel: lock order reversal: Jun 28 23:01:19 tor kernel: 1st 0xc3795000 kqueue (kqueue) @ /usr/src/sys/kern/kern_event.c:1053 Jun 28 23:01:19 tor kernel: 2nd 0xc1043144 system map (system map) @ /usr/src/sys/vm/ Looks similar to http://sources.zabbadoz.net/freebsd/lor.html#185. Could you run vmstat -z, netstat -m, and vmstat -m please? I enabled polling three days ago and saw this lor two times since then. It may or may not be a coincidence. The system is still up at the moment, so the lor might have nothing to do with the crashes/hangs/whatever. Actually I had to reset the box about two hours ago, I just forgot and overlooked the few minutes downtime in the logs. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)
Peter Thoenen [EMAIL PROTECTED] wrote: To you have pf running? If so can you turn it off for a bit a see if you still crash. On my box I was getting all sorts of witness kbd backtraces on pf and since turning pf off (maybe a week ago), haven't crashed yet. Going to let it keep running unmetered for another 2 weeks and see if I crash or not. I'm running Tor jailed and use PF for NAT, port forwarding and filtering: http://tor.fabiankeil.de/pf-stats/ So far I didn't see a single PF related complaint from witness, but I'll try disabling PF in a few days anyway. At the moment I'm still testing if enabling polling really increases the uptime. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)
Fabian Keil [EMAIL PROTECTED] wrote: Fabian Keil [EMAIL PROTECTED] wrote: Robert Watson [EMAIL PROTECTED] wrote: It sounds like your serial console server may not know how to map SSH break signals into remote serial break signals. Try ALT_BREAK_TO_DEBUGGER. Here's the description from NOTES: # Solaris implements a new BREAK which is initiated by a character # sequence CR ~ ^b which is similar to a familiar pattern used on # Sun servers by the Remote Console. options ALT_BREAK_TO_DEBUGGER It took me several attempts to get the character sequence right, but yes, this one works. Thanks. Unfortunately it didn't work while the system was hanging this morning. Since then I got one or two hangs a day and entering the debugger never worked out, even if my console connection was opened a few minutes before the hang. I no longer think it has anything to do with the terminal server, but assume the hang takes the console with it. sio0 is running on acpi0, so I tried to disable acpi to see if it changes anything, but the only change I got was that fxp0 stopped working (it is up but only produces timeout warnings). I tried to partly disable acpi subsystems like described in acpi(4), but either I got the syntax wrong, or it just isn't working. Can someone on this list confirm or deny if something like debug.acpi.disabled=isa in /boot/loader.conf makes sense? That's how I understand the man page, but I don't see any reaction. I also tried /etc/sysctl.conf (which probably is parsed too late anyway) but I just got a message that the sysctl does not exists. sysctl debug.acpi indeed only shows: debug.acpi.do_powerstate: 1 debug.acpi.acpi_ca_version: 0x20041119 debug.acpi.semaphore_debug: 0 so maybe I need some special acpi options or it just doesn't work if acpi is loaded as a module, but as least the man page has no such hints. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)
Dan Nelson [EMAIL PROTECTED] wrote: In the last episode (Jul 02), Robert Watson said: On Sun, 2 Jul 2006, Fabian Keil wrote: The ssh man page offers: |~B Send a BREAK to the remote system (only useful for SSH |protocol version 2 and if the peer supports it). I am using ssh 2, but the only reaction I get is a new line. |FreeBSD/i386 (tor.fabiankeil.de) (ttyd0) | |login: ~B If you enter ~B and actually see a ~B printed to the screen, then ssh didn't process it because you didn't hit cr first. So cr~B will tell ssh to send a break. I am actually using cr~B and I don't see just ~B, but ~B . The tilde is printed after I release B, therefore I guess it is working. It sounds like your serial console server may not know how to map SSH break signals into remote serial break signals. Try ALT_BREAK_TO_DEBUGGER. Here's the description from NOTES: # Solaris implements a new BREAK which is initiated by a character # sequence CR ~ ^b which is similar to a familiar pattern used on # Sun servers by the Remote Console. options ALT_BREAK_TO_DEBUGGER ... and if you're sshing to your terminal server, remember that ssh will eat that tilde (because you sent cr~ ), so you need to send cr~~^B to pass the right characters to FreeBSD. Or change ssh's escape character with the -e flag. cr~^b works for me, without touching any ssh settings. As cr~. is still causing a disconnect, it doesn't look like the escape character was changed either. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)
Fabian Keil [EMAIL PROTECTED] wrote: Robert Watson [EMAIL PROTECTED] wrote: It sounds like your serial console server may not know how to map SSH break signals into remote serial break signals. Try ALT_BREAK_TO_DEBUGGER. Here's the description from NOTES: # Solaris implements a new BREAK which is initiated by a character # sequence CR ~ ^b which is similar to a familiar pattern used on # Sun servers by the Remote Console. options ALT_BREAK_TO_DEBUGGER It took me several attempts to get the character sequence right, but yes, this one works. Thanks. Unfortunately it didn't work while the system was hanging this morning. I wasn't logged in at the console before the hang occurred, so it maybe that the terminal server checked the console for life signs, found none and did neither connect nor print a warning (wild guess I have no idea if it does that). It could also mean that I'm seeing the mysterious power off part described in: http://www.freebsd.org/cgi/query-pr.cgi?pr=95180 but I have no way to tell the difference. I will stay connected to the console until the system hangs again to see if it changes anything. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)
Robert Watson [EMAIL PROTECTED] wrote: On Tue, 27 Jun 2006, Fabian Keil wrote: There was a request for Tor related problem reports a while ago, I couldn't find the message again, but I believe it was posted here. I'm very interested in tracking down this problem, but have had a lot of trouble getting reliable reports of problems -- i.e., ones where I could get any debugging information. I had a similar conversation on these lines yeterday with Roger (Tor author) here at the WEIS conference. If this is easily reproduceable, I would like you to do the following: - Does the hang occur? If so, use a serial break to get into DDB, see the above. I previously had the serial console misconfigured and I'm still not sure if the settings are correct now. So far I put BOOT_COMCONSOLE_SPEED=57600 in /etc/make.conf, options CONSPEED=57600 in the kernel and console=comconsole in /boot/loader.conf. Kernel and bootblock were recompiled and reinstalled. /boot.config contains the line: -D -h -S57600 (speed setting through make.conf didn't work). The boot process now starts with: PXELINUX 3.11 2005-09-02 Copyright (C) 1994-2005 H. Peter Anvin Booting from local disk... 1 Linux 2 FreeBSD 3 FreeBSD Default: 2 /boot.config: -DConsoles: internal video/keyboard serial port BIOS drive C: is disk0 BIOS 639kB/523200kB available memory FreeBSD/i386 bootstrap loader, Revision 1.1 [...] After manually triggering a test panic through debug.kdb.enter I could enter ddb and everything seemed to be working. However today I got another hang and couldn't enter the debugger by sending BREAK. It is the same BREAK ssh sends with ~B, right? Even after rebooting, sending break didn't trigger a panic, so either I'm sending the wrong BREAK, or my console settings are still messed up. Any ideas? Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)
Robert Watson [EMAIL PROTECTED] wrote: On Sun, 2 Jul 2006, Fabian Keil wrote: After manually triggering a test panic through debug.kdb.enter I could enter ddb and everything seemed to be working. However today I got another hang and couldn't enter the debugger by sending BREAK. It is the same BREAK ssh sends with ~B, right? Even after rebooting, sending break didn't trigger a panic, so either I'm sending the wrong BREAK, or my console settings are still messed up. Any ideas? What serial software are you using to reach the console? I use ssh to log in to a console server, hit enter and am connected to the console. I have no idea what kind of software is used between console server and console. Do you have options BREAK_TO_DEBUGGER compiled into your kernel? Yes, together with the other options you suggested: makeoptions DEBUG=-g options DDB #options KDB_UNATTENDED options KDB options BREAK_TO_DEBUGGER options WITNESS options WITNESS_SKIPSPIN options INVARIANTS options INVARIANT_SUPPORT The delivery mechanism for the break will depend on the software you're using... The ssh man page offers: |~B Send a BREAK to the remote system (only useful for SSH protocol |version 2 and if the peer supports it). I am using ssh 2, but the only reaction I get is a new line. |FreeBSD/i386 (tor.fabiankeil.de) (ttyd0) | |login: ~B | Maybe machdep.enable_panic_key would be another solution? The description says Enable panic via keypress specified in kbdmap(5), I'm just not sure if console input qualifies as keypress. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)
Robert Watson [EMAIL PROTECTED] wrote: On Sun, 2 Jul 2006, Fabian Keil wrote: I am using ssh 2, but the only reaction I get is a new line. |FreeBSD/i386 (tor.fabiankeil.de) (ttyd0) | |login: ~B | It sounds like your serial console server may not know how to map SSH break signals into remote serial break signals. Try ALT_BREAK_TO_DEBUGGER. Here's the description from NOTES: # Solaris implements a new BREAK which is initiated by a character # sequence CR ~ ^b which is similar to a familiar pattern used on # Sun servers by the Remote Console. options ALT_BREAK_TO_DEBUGGER It took me several attempts to get the character sequence right, but yes, this one works. Thanks. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)
Robert Watson [EMAIL PROTECTED] wrote: On Thu, 29 Jun 2006, Fabian Keil wrote: I wish I could. The machine died before I read your message. I was logged in on the serial console running tail -f /var/log/messages. Last messages were: Jun 29 00:42:20 tor kernel: Memory modified after free 0xc4275000(2048) val=a020c0de @ 0xc4275000 Jun 29 00:42:20 tor kernel: Memory modified after free 0xc4055800(2048) val=a020c0de @ 0xc432a000 Jun 29 00:42:24 tor kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=34263674 Jun 29 00:42:24 tor kernel: Memory modified after free 0xc3dff800(2048) val=a020c0d Ctrl+Alt+ESC didn't trigger any reaction, so I caused a reset through the ISP's webinterface. Now the system appears to be hosed, at least FreeBSD never reaches the login: PXELINUX 3.11 2005-09-02 Copyright (C) 1994-2005 H. Peter Anvin Booting from local disk... 1 Linux 2 FreeBSD 3 FreeBSD Default: 2 [nothing] The ATA error above is a bit distressing, as is the fact that it won't boot. Is [nothing] normally the FreeBSD boot loader rather than nothing? The 1 Linux ... part already is the FreeBSD boot loader. Normally it goes: PXELINUX 3.11 2005-09-02 Copyright (C) 1994-2005 H. Peter Anvin Booting from local disk... 1 Linux 2 FreeBSD 3 FreeBSD Default: 2 FreeBSD/i386 (tor.fabiankeil.de) (ttyd0) login: I would suggest running some hardware diagnostics to make sure we're dealing with reliable hardware before continuing so that we're not chasing both hardware and software problems, since you can't reliably debug software problems in the presence of hardware failures. I'll see what the ports collection has to offer (running smartmontools right now) but so far it's the only ATA message I got. Probably something which would be easy to resolve with keyboard access and a screen, but I think I'm forced to use the RecoveryManager. Unfortunately recovery means reinstalling the preconfigured GNU/Linux which I than can replace with FreeBSD again. If there ever was a core dump it will be gone, and so will be kernel.debug. Lucky me. The RecoveryManager turned out to be a full featured PXE-booted GNU/Linux system. It allowed me to fetch and replace /dev/ad0s2a (/) through ssh. The system is online again. After fsck -y /dev/ad0s3d (/usr) the whole tor jail is gone, but the rest of this slice seems to be ok, including kernel.debug. I can't fsck /var: [EMAIL PROTECTED] ~]$ sudo fsck /dev/ad0s3d ** /dev/ad0s3d ** Last Mounted on /var ** Phase 1 - Check Blocks and Sizes fsck_4.2bsd: cannot alloc 1082190976 bytes for inoinfo but it can still be mounted. No core dump though. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)
Robert Watson [EMAIL PROTECTED] wrote: On Tue, 27 Jun 2006, Fabian Keil wrote: There was a request for Tor related problem reports a while ago, I couldn't find the message again, but I believe it was posted here. I'm very interested in tracking down this problem, but have had a lot of trouble getting reliable reports of problems -- i.e., ones where I could get any debugging information. I had a similar conversation on these lines yeterday with Roger (Tor author) here at the WEIS conference. If this is easily reproduceable, I would like you to do the following: - Compile in options DDB, options KDB, options BREAK_TO_DEBUGGER, options WITNESS, options WITNESS_SKIPSPIN, options INVARIANTS, options INVARIANT_SUPPORT. - Make sure to have a kernel with debugging symbols for the kernel. - Turn on core dumps. Done. I expect to get a chance to test the settings in the next 24 hours. The above debugging options will have a significant performance impact, and may or may not affect the probability of the race or deadlock being exercised. The first question is: - Are there any warnings on the console from WITNESS or other debugging options? If so, please copy/paste them into an e-mail for me. So far the logs show nothing unusual, but I noticed that the ssh connection gets unresponsive from time to time. I did a few pings with interesting results: [EMAIL PROTECTED] ~]$ ping 10.0.0.1 | grep 'time=[^0]' 64 bytes from 10.0.0.1: icmp_seq=25 ttl=64 time=1.104 ms 64 bytes from 10.0.0.1: icmp_seq=61 ttl=64 time=2.983 ms 64 bytes from 10.0.0.1: icmp_seq=167 ttl=64 time=1.112 ms 64 bytes from 10.0.0.1: icmp_seq=189 ttl=64 time=1.653 ms 64 bytes from 10.0.0.1: icmp_seq=222 ttl=64 time=1.748 ms 64 bytes from 10.0.0.1: icmp_seq=291 ttl=64 time=1.058 ms 64 bytes from 10.0.0.1: icmp_seq=334 ttl=64 time=1.020 ms 64 bytes from 10.0.0.1: icmp_seq=337 ttl=64 time=1.967 ms 64 bytes from 10.0.0.1: icmp_seq=562 ttl=64 time=1.027 ms 64 bytes from 10.0.0.1: icmp_seq=586 ttl=64 time=1.230 ms [EMAIL PROTECTED] ~]$ ping tor.fabiankeil.de | grep 'time=[^0]' 64 bytes from 81.169.155.246: icmp_seq=70 ttl=64 time=1.920 ms 64 bytes from 81.169.155.246: icmp_seq=79 ttl=64 time=1.587 ms 64 bytes from 81.169.155.246: icmp_seq=402 ttl=64 time=1.062 ms [EMAIL PROTECTED] ~]$ ping localhost | grep 'time=[^0]' 64 bytes from 127.0.0.1: icmp_seq=142 ttl=64 time=1.142 ms 64 bytes from 127.0.0.1: icmp_seq=497 ttl=64 time=1.227 ms 64 bytes from 127.0.0.1: icmp_seq=627 ttl=64 time=1.181 ms 10.0.0.1 is on lo1, 81.169.155.246 is on fxp0, both are filtered with pf. lo0 is skipped. The pings were run locally while tor was running, the usual ping response times are below 0.2 ms. I get even more obscene ping times if I ping from home, but my net connection isn't the best. I'd appreciate if someone with a reliable net connection could confirm the weirdness. Thanks for your time, Robert, I hope to have real information by tomorrow. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)
Robert Watson [EMAIL PROTECTED] wrote: - Are there any warnings on the console from WITNESS or other debugging options? I just got: Jun 28 23:01:19 tor kernel: lock order reversal: Jun 28 23:01:19 tor kernel: 1st 0xc3795000 kqueue (kqueue) @ /usr/src/sys/kern/kern_event.c:1053 Jun 28 23:01:19 tor kernel: 2nd 0xc1043144 system map (system map) @ /usr/src/sys/vm/vm_map.c:2390 Jun 28 23:01:20 tor kernel: KDB: stack backtrace: Jun 28 23:01:20 tor kernel: kdb_backtrace(0,,c0711af0,c0713440,c06db624) at kdb_backtrace+0x29 Jun 28 23:01:20 tor kernel: witness_checkorder(c1043144,9,c06b90a8,956) at witness_checkorder+0x578 Jun 28 23:01:20 tor kernel: _mtx_lock_flags(c1043144,0,c06b90a8,956) at _mtx_lock_flags+0x5b Jun 28 23:01:20 tor kernel: _vm_map_lock(c10430c0,c06b90a8,956) at _vm_map_lock+0x26 Jun 28 23:01:20 tor kernel: vm_map_remove(c10430c0,c3bc6000,c3bc8000,d6f55b30,c0623361) at vm_map_remove+0x1f Jun 28 23:01:20 tor kernel: kmem_free(c10430c0,c3bc6000,2000,d6f55b48,c062524f) at kmem_free+0x25 Jun 28 23:01:20 tor kernel: page_free(c3bc6000,2000,22,2000,d6f55b60) at page_free+0x29 Jun 28 23:01:20 tor kernel: uma_large_free(c3ba5140) at uma_large_free+0x7b Jun 28 23:01:20 tor kernel: free(c3bc6000,c06d8980,c3bc6000,c483,1400) at free+0xc5 Jun 28 23:01:20 tor kernel: kqueue_expand(c3795000,c06d8a40,500,0) at kqueue_expand+0xd7 Jun 28 23:01:20 tor kernel: kqueue_register(c3795000,d6f55bf4,c3a8f480,1,0) at kqueue_register+0x1b8 Jun 28 23:01:20 tor kernel: kern_kevent(c3a8f480,3,19,200,d6f55cc8) at kern_kevent+0xc9 Jun 28 23:01:20 tor kernel: kevent(c3a8f480,d6f55d04,6,2,212) at kevent+0x55 Jun 28 23:01:20 tor kernel: syscall(2824003b,80e003b,bfbf003b,cb87000,80d5020) at syscall+0x22f Jun 28 23:01:20 tor kernel: Xint0x80_syscall() at Xint0x80_syscall+0x1f Jun 28 23:01:20 tor kernel: --- syscall (363, FreeBSD ELF32, kevent), eip = 0x282cc4af, esp = 0xbfbfe9fc, ebp = 0xbfbfea48 --- Looks similar to http://sources.zabbadoz.net/freebsd/lor.html#185. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)
Robert Watson [EMAIL PROTECTED] wrote: On Wed, 28 Jun 2006, Fabian Keil wrote: Robert Watson [EMAIL PROTECTED] wrote: - Are there any warnings on the console from WITNESS or other debugging options? I just got: Jun 28 23:01:19 tor kernel: lock order reversal: Jun 28 23:01:19 tor kernel: 1st 0xc3795000 kqueue (kqueue) Looks similar to http://sources.zabbadoz.net/freebsd/lor.html#185. Could you run vmstat -z, netstat -m, and vmstat -m please? I wish I could. The machine died before I read your message. I was logged in on the serial console running tail -f /var/log/messages. Last messages were: Jun 29 00:42:20 tor kernel: Memory modified after free 0xc4275000(2048) val=a020c0de @ 0xc4275000 Jun 29 00:42:20 tor kernel: Memory modified after free 0xc4055800(2048) val=a020c0de @ 0xc4055800 Jun 29 00:42:20 tor kernel: Memory modified after free 0xc4ca(2048) val=a020c0de @ 0xc4ca Jun 29 00:42:20 tor kernel: Memory modified after free 0xc39ef000(2048) val=a020c0de @ 0xc39ef000 Jun 29 00:42:24 tor kernel: Memory modified after free 0xc4bd7000(2048) val=a020c0de @ 0xc4bd7000 Jun 29 00:42:24 tor kernel: Memory modified after free 0xc3c8a000(2048) val=a020c0de @ 0xc3c8a000 Jun 29 00:42:24 tor kernel: Memory modified after free 0xc33bd000(2048) val=a020c0de @ 0xc33bd000 Jun 29 00:42:24 tor kernel: Memory modified after free 0xc3f1d000(2048) val=a020c0de @ 0xc3f1d000 Jun 29 00:42:24 tor kernel: Memory modified after free 0xc45dc800(2048) val=a020c0de @ 0xc45dc800 Jun 29 00:42:24 tor kernel: Memory modified after free 0xc429e000(2048) val=a020c0de @ 0xc429e000 Jun 29 00:42:24 tor kernel: Memory modified after free 0xc3aef800(2048) val=a020c0de @ 0xc3aef800 Jun 29 00:42:24 tor kernel: Memory modified after free 0xc432a000(2048) val=a020c0de @ 0xc432a000 Jun 29 00:42:24 tor kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=34263674 Jun 29 00:42:24 tor kernel: Memory modified after free 0xc3dff800(2048) val=a020c0d Ctrl+Alt+ESC didn't trigger any reaction, so I caused a reset through the ISP's webinterface. Now the system appears to be hosed, at least FreeBSD never reaches the login: PXELINUX 3.11 2005-09-02 Copyright (C) 1994-2005 H. Peter Anvin Booting from local disk... 1 Linux 2 FreeBSD 3 FreeBSD Default: 2 [nothing] Probably something which would be easy to resolve with keyboard access and a screen, but I think I'm forced to use the RecoveryManager. Unfortunately recovery means reinstalling the preconfigured GNU/Linux which I than can replace with FreeBSD again. If there ever was a core dump it will be gone, and so will be kernel.debug. On the bright side you can chose the OS to go with. Should I use Current to see if the problem still exists? Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
FreeBSD 6.1 Tor issues (Once More, with Feeling)
There was a request for Tor related problem reports a while ago, I couldn't find the message again, but I believe it was posted here. Last week I installed: FreeBSD tor.fabiankeil.de 6.1-RELEASE-p2 FreeBSD 6.1-RELEASE-p2 #0: Fri Jun 23 20:06:57 CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/BIGSLEEP i386. At the moment it is only acting as Tor node http://serifos.eecs.harvard.edu/cgi-bin/desc.pl?q=zwiebelsuppe tor-devel (maintainer CC'd) is running jailed in a Geli image, ntpd, named, cron and sshd are running in the host system and that's about it. No mail or web server and nearly no traffic besides the one caused by Tor. I started Tor Friday night and had to reset the box three times since then. The server just suddenly stops responding, the logs stop as well, therefore I assume it either panics or hangs. I only have remote access, a serial console is available, but it becomes unresponsive as well. I didn't configure DDB yet, so maybe that is to be expected? cron creates some stats every five minutes, a few minutes before a hang this morning the load was: last pid: 7996; load averages: 0.40, 0.37, 0.36 up 0+18:38:2505:55:02 83 processes: 2 running, 66 sleeping, 15 waiting CPU states: 21.3% user, 0.0% nice, 17.8% system, 20.2% interrupt, 40.7% idle Mem: 100M Active, 157M Inact, 102M Wired, 12K Cache, 60M Buf, 134M Free Swap: 1024M Total, 1024M Free PID USERNAME THR PRI NICE SIZERES STATETIME WCPU COMMAND 11 root1 171 52 0K 8K RUN857:30 53.61% idle 12 root1 -44 -163 0K 8K WAIT45:22 6.54% swi1: net 23 root1 -68 -187 0K 8K WAIT14:48 2.83% irq12: fxp0 fxp1 7973 root1 960 2264K 1544K RUN 0:00 0.51% top 13 root1 -32 -151 0K 8K WAIT 5:49 0.10% swi4: clock sio 33 root1 171 52 0K 8K pgzero 0:02 0.10% pagezero 3 root1 -80 0K 8K -0:16 0.05% g_up 1586 _tor 14 20099M 97912K kserel 188:36 0.00% tor 15 root1 -160 0K 8K -1:01 0.00% yarrow 1443 root1 -80 0K 8K geli:w 0:49 0.00% g_eli[0] md0 4 root1 -80 0K 8K -0:21 0.00% g_down 35 root1 200 0K 8K syncer 0:17 0.00% syncer 1439 root1 -80 0K 8K mdwait 0:13 0.00% md0 24 root1 -64 -183 0K 8K WAIT 0:08 0.00% irq14: ata0 2 root1 -80 0K 8K -0:07 0.00% g_event 42 root1 -160 0K 8K -0:06 0.00% schedcpu 453 root1 960 2920K 1752K select 0:05 0.00% ntpd 256 _pflogd 1 -580 1548K 1216K bpf 0:05 0.00% pflog pfctls -si: Status: Enabled for 0 days 18:37:52 Debug: Urgent Hostid: 0x1ec3da6b Interface Stats for fxp0 IPv4 IPv6 Bytes In 250778591590 Bytes Out274988633620 Packets In Passed361927600 Blocked 322130 Packets Out Passed368714320 Blocked2650 State Table Total Rate current entries 5290 searches73567507 1096.8/s inserts 6000688.9/s removals 5947788.9/s Counters match 752600 11.2/s bad-offset 00.0/s fragment 1020.0/s short 00.0/s normalize 20.0/s memory680.0/s bad-timestamp 00.0/s congestion 00.0/s ip-option 00.0/s proto-cksum00.0/s state-mismatch 126550.2/s state-insert 00.0/s state-limit00.0/s src-limit 20.0/s synproxy Today's traffic graph: http://www.fabiankeil.de/blog-surrogat/2006/06/27/tor.fabiankeil.de-dritter-ausfall-24-stunden-durchsatz-statistik-595x337.png (The hang around 14:00 happened while I was logged in doing a buildworld) At the moment I'm building RELENG_6 with DDB to see if it changes anything and if I can get a core dump, but so far the problem seems to be similar to: http://www.freebsd.org/cgi/query-pr.cgi?pr=95180 (closed) and http://freebsd.rambler.ru/bsdmail/freebsd-questions_2006/msg08692.html. Is anyone on this
Re: GELI issues ? (Re: Increase in panics under 6.1)
Stanislaw Halik [EMAIL PROTECTED] wrote: On Thu, May 25, 2006, Fabian Keil wrote: Interestingly enough , i had some nasty issues todays on same laptop. I had 2 x 6 GB GELI vnodes, running mtree -K md5digest to compare contents. Disk IO was high as expected...but then it just died down (but the mtree hadnt finished). (swap is also GELI) Any subsequent process trying to access the encrypted mount points simply stalled for as long as I cared to wait (10 minutes). The processes even stalled a shutdown -r. I'm not sure if it's related, but I lately see this behaviour on NFS mounts if the server is not responding. Doing cd /mnt/mydeadnfsmount/[tab for autocompletion] is enough to render the current console unresponsive. Isn't that normal and desired for `hard' mounts? Now that you mention it I guess your right. It totally forgot that hard mounts are the default. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: kmem leak in tmpmfs?
Iasen Kostov [EMAIL PROTECTED] wrote: On Thu, 2006-05-25 at 16:54 -0400, Kris Kennaway wrote: On Thu, May 25, 2006 at 06:01:30PM +0200, Arno J. Klaassen wrote: I get a very easy to reproduce panic on 6.1-STABLE : /etc/periodic/weekly/310.locate panics with panic: kmem_malloc(4096): kmem_map too small: 335544320 total allocated It looks like you are using a malloc-backed md and you do not have enough RAM to handle the size. Perhaps tmpmfs does not use swap backing, as it is supposed to? First of all if there is not enough kmem (not just plain ram I think) kernel should not allow disk creation in first place, second - I think (although there could be some ... reason for that) it's stupid way to say I don't have more kmem by panicing :). Better way will be just to fail disk operation of that FS with Disk is full or something like that. At home I tried to raise kmem like that: vm.kmem_size_max=1073741824 (I got 2G of RAM) (setting vm.kmem_size directly panices kernel at boot if I remember correctly). but for my surprise kernel panices at exact same allocated md disk space with the same panic as the original poster's. Is it possible that I should rise KVA_PAGES too ? And I don't think its documented anywhere (of course I've tried googling and it's always possible that I've missed something :). All this was on FreeBSD 6.0. man mdconfig mentions the problem: malloc Storage for this type of memory disk is allocated with malloc(9). This limits the size to the malloc bucket limit in the kernel. If the -o reserve option is not set, creating and filling a large malloc-backed memory disk is a very easy way to panic a system. Use a swap backed disk and the problem will disappear. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: GELI issues ? (Re: Increase in panics under 6.1)
Norberto Meijome [EMAIL PROTECTED] wrote: On Tue, 23 May 2006 22:01:16 -0400 Kris Kennaway [EMAIL PROTECTED] wrote: So what is the traceback? See the developers handbook for more information. doh! yes, i'll get onto this as soon as I can. Interestingly enough , i had some nasty issues todays on same laptop. I had 2 x 6 GB GELI vnodes, running mtree -K md5digest to compare contents. Disk IO was high as expected...but then it just died down (but the mtree hadnt finished). (swap is also GELI) Any subsequent process trying to access the encrypted mount points simply stalled for as long as I cared to wait (10 minutes). The processes even stalled a shutdown -r. I'm not sure if it's related, but I lately see this behaviour on NFS mounts if the server is not responding. Doing cd /mnt/mydeadnfsmount/[tab for autocompletion] is enough to render the current console unresponsive. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: kmem leak in tmpmfs?
Arno J. Klaassen [EMAIL PROTECTED] wrote: Hello, I get a very easy to reproduce panic on 6.1-STABLE : /etc/periodic/weekly/310.locate panics with panic: kmem_malloc(4096): kmem_map too small: 335544320 total allocated This box has nothing particular, apart from maybe a large number of stamp-file based test-databases (with a lot of zero-sized files named .key=value). Producing this bug is easy : - set tmpmfs=YES and set tmpsize greater than around 220m - start /etc/periodic/weekly/310.locate (and nothing else!) - wait two-three hours and bang Last test is with tmpfs=1024m and I monitored df -h /tmp and vmstat -zm every minute; when the system panics, last output is : FilesystemSizeUsed Avail Capacity Mounted on /dev/md0 989M219M691M24%/var/tmp vmstat -zm | fgrep md0 md0: 512,0, 453257, 15, 453437 I'm quite not an expert, but looks to me as if md0 use stays almost 100% in kmem and is never swapped (as it is supposed to do by default according to the man-page). The rc script has different defaults than mdmfs: [EMAIL PROTECTED] ~ $grep tmpmfs_flags /etc/defaults/rc.conf tmpmfs_flags=-S -M# Extra mdmfs options for the mfs /tmp You probably want to ditch the -M. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Loading geom_eli in loader.conf disables psm0
To encrypt my home slice with geli I followed 17.16.2 Disk Encryption with geli: http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/disks-encrypting.html#AEN26326 As I prefer to have my home directory available after boot, I additionally added: geli_devices=ad0s1 geli_ad0s1_flags=-k /root/ad0s1.key to rc.conf and rebooted. geli worked, my mouse no longer did. psm0 got lost: --- dmesg-geli-enabled-in-loader.conf.txt Mon May 22 18:17:23 2006 +++ dmesg-without-geli-enabled-in-loader.conf.txt Mon May 22 18:21:33 2006 [...] @@ -76,7 +76,9 @@ atkbd0: AT Keyboard irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] -acpi_ibm0: IBM ThinkPad ACPI Extras irq 12 on acpi0 +psm0: PS/2 Mouse irq 12 on atkbdc0 +psm0: [GIANT-LOCKED] +psm0: model Generic PS/2 mouse, device ID 0 sio0: 16550A-compatible COM port port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A ppc0: Standard parallel printer port port 0x3bc-0x3be irq 7 on acpi0 @@ -88,12 +90,13 @@ sio1: type 16550A battery0: ACPI Control Method Battery on acpi0 acpi_acad0: AC Adapter on acpi0 +acpi_ibm0: IBM ThinkPad ACPI Extras on acpi0 pmtimer0 on isa0 [...] After I removed 'geom_eli_load=YES' in loader.conf and rebooted psm0 was back and my mouse started to work again. I saw no geli regression either, I assume geom_eli.ko is loaded on demand by geli's rc script. [EMAIL PROTECTED] ~ $kldstat Id Refs AddressSize Name 1 25 0xc040 41309c kernel 21 0xc0814000 b880 unionfs.ko 31 0xc082 5760 if_tap.ko 41 0xc0826000 565c snd_ich.ko 52 0xc082c000 258d4sound.ko 61 0xc0852000 43f4 acpi_video.ko 73 0xc0857000 62fdcacpi.ko 81 0xc08ba000 21dacradeon.ko 92 0xc08dc000 10d80drm.ko 101 0xc08ed000 4c88 acpi_ibm.ko 113 0xc08f2000 215ccwlan.ko 121 0xc0914000 2ea0 wlan_wep.ko 131 0xc0917000 eec8 if_iwiNG.ko 143 0xc0926000 2e60 firmware.ko 151 0xc0929000 300fciwi_bss.ko 161 0xc095a000 9500 cpufreq.ko 171 0xc35a2000 b000 geom_eli.ko 181 0xc35c1000 19000crypto.ko 191 0xc35ad000 a000 zlib.ko My /boot/loader.conf: loader_logo=beastie loader_color=YES autoboot_delay=1 hw.ata.atapi_dma=1 radeon_load=YES acpi_video_load=YES acpi_ibm_load=YES wlan_load=YES wlan_wep_load=YES if_iwiNG_load=YES iwi_bss_load=YES cpufreq_load=YES snd_ich_load=YES if_tap_load=YES unionfs_load=YES #geom_eli_load=YES hw.psm.synaptics_support=1 [EMAIL PROTECTED] ~ $uname -a FreeBSD TP51.local 6.1-STABLE FreeBSD 6.1-STABLE #30: Mon May 22 15:52:13 CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/THINKPAD i386 Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Configuring FreeBSD 4.9 on a new system (was: FreeBSD Newbie...)
John Dworske [EMAIL PROTECTED] wrote: Help me...yeah...OK...so here it goes...I am brand new to FreeBSD...installed OS onto a box from a set of floppies I got off the net... Last login: Wed May 3 14:37:21 2006 from 10.10.20.20 Copyright (c) 1980, 1983, 1986, 1988, 1990, 1991, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.9-RELEASE (GENERIC) #0: Mon Oct 27 17:51:09 GMT 2003 Wondering what I need to update my system to make sure it has everything I need to do work...like want to setup a slave DNS server and apache webserver for starters... How about replacing it with FreeBSD 6.1 first? Bind is part of the base system, and Apache part of the ports collection. While this is true for FreeBSD 4.9 as well, you shouldn't use such an old release unless you have a reason. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: devfs.conf and pass0
JoaoBR [EMAIL PROTECTED] wrote: seems on recent releng_6 (RC1) the permissions set to pass0 within /etc/devfs.conf are not applied anymore and need to be set manual in order getting acd0 available to users Works for me on FreeBSD 6.1-RC #1: Sun Apr 9 20:07:42 CEST 2006. Did you by any chance just forgot to add a newline after your pass0 line? Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: devfs.conf and pass0
JoaoBR [EMAIL PROTECTED] wrote: On Thursday 13 April 2006 09:28, Fabian Keil wrote: JoaoBR [EMAIL PROTECTED] wrote: seems on recent releng_6 (RC1) the permissions set to pass0 within /etc/devfs.conf are not applied anymore and need to be set manual in order getting acd0 available to users Works for me on FreeBSD 6.1-RC #1: Sun Apr 9 20:07:42 CEST 2006. Did you by any chance just forgot to add a newline after your pass0 line? nooo there are others below and the last is an empty line the permissions are set as before to acd0 and cd0 but not to pass0 I cvsuped yesterday 6.1-RC FreeBSD 6.1-RC #3: Wed Apr 12 18:15:55 BRT 2006 seems there was a change in devfs.h yesterday or any other idea? I cvsuped a few minutes ago and didn't see any devfs changes. I'm now running FreeBSD 6.1-RC #0: Thu Apr 13 17:01:11 CEST 2006 and all rules in /etc/devfs.conf still apply. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: truss problems
Jonas Wolz [EMAIL PROTECTED] wrote: while trying to get the gnash CVS version to work I noticed that on my system (FreeBSD 6.0-RELEASE) truss obviously has problems tracing firefox: truss prints somewhat random error messages and traces only some of the system calls firefox makes (opening a local file doesn't show up, for example). The output looks like that (I can provide the truss log if somebody is interested): [EMAIL PROTECTED]:/tmp$ truss -f -o ff.log firefox truss: PIOCWAIT top of loop: Input/output error truss: get_struct 0x0: Bad address [EMAIL PROTECTED]:/tmp$ truss -f -o ff.log firefox truss: Cannot malloc 1081891232 bytes for pollfd array: Cannot allocate memory [EMAIL PROTECTED]:/tmp$ truss -f -o ff.log firefox truss: cannot open /proc/0/mem: No such file or directory truss: cannot open /proc/0/mem: No such file or directory truss: Cannot malloc 1162889024 bytes for pollfd array: Cannot allocate memory [EMAIL PROTECTED]:/tmp$ truss -f -o ff.log firefox truss: PIOCWAIT top of loop: Input/output error truss: PIOCCONT: Input/output error truss: Cannot malloc 1162889024 bytes for pollfd array: Cannot allocate memory [EMAIL PROTECTED]:/tmp$ truss -f -o ff.log firefox truss: PIOCWAIT top of loop: Input/output error truss: Cannot malloc 1162889024 bytes for pollfd array: Cannot allocate memory [EMAIL PROTECTED]:/tmp$ truss -f -o ff.log firefox truss: cannot open /proc/0/mem: No such file or directory truss: Cannot malloc 1162889024 bytes for pollfd array: Cannot allocate memory [EMAIL PROTECTED]:/tmp$ truss -f -o ff.log firefox truss: PIOCWAIT top of loop: Input/output error truss: PIOCCONT: Input/output error truss: cannot open /proc/0/mem: No such file or directory truss: Cannot malloc 1162889024 bytes for pollfd array: Cannot allocate memory [EMAIL PROTECTED]:/tmp$ truss -f -o ff.log firefox truss: PIOCWAIT top of loop: Input/output error truss: get_struct 0x0: Bad address [EMAIL PROTECTED]:/tmp$ Can someone else also reproduce this problem/is this a known bug or is just something broken on my system? If you need more details please let me know. I can't reproduce exactly the same problem on FreeBSD TP51.local 6.1-RC FreeBSD 6.1-RC #1: Sun Apr 9 20:07:42 CEST 2006 but I get a different problem with truss and Firefox. If I run truss -f firefox it seems to get stuck after a while. 1274: mmap(0x0,36864,(0x3)PROT_READ|PROT_WRITE,(0x1002)MAP_ANON|MAP_PRIVATE,-1,0x0) = 689876992 (0 1274: kse_release(0x8064fa0)= 0 (0x0) 1274: kse_release(0x8064fa0)= 0 (0x0) 1274: kse_release(0x8064fac)= 0 (0x0) 1274: kse_release(0x8064fa0)= 383 (0x17f) 1274: kse_release(0x8064fa0)= 383 (0x17f) 1274: kse_release(0x8064fa0)= 0 (0x0) 1274: kse_release(0x8064fa0)= 383 (0x17f) 1274: kse_release(0x8064fa0)= 383 (0x17f) 1274: kse_release(0x8064fa0)= 0 (0x0) ^C 1259: wait4(0x,0xbfbfe9d8,0x2,0x0)ERR#4 'Interrupted system call' 1266: wait4(0x,0xbfbfe728,0x2,0x0) ERR#4 'Interrupted system call' 1274: kse_release(0x8064fa0)= 383 (0x17f) truss firefox seems to work. If I attach truss to a running Firefox I get: [EMAIL PROTECTED] ~ $truss -f -p 1440 1440: (null)() = 0 (0x0) 1440: kse_release(0x8064fa0)= 0 (0x0) 1440: kse_release(0x8064fa0)= 0 (0x0) 1440: kse_release(0x8064fa0)= 0 (0x0) 1440: kse_release(0x8064fa0)= 0 (0x0) 1440: kse_release(0x8064fac)= 0 (0x0) 1440: kse_release(0x8064fa0)= 0 (0x0) 1440: kse_release(0x8064fa0)= 0 (0x0) 1440: kse_release(0x8064fa0)= 0 (0x0) truss: Cannot malloc -67210816 bytes for pollfd array: Cannot allocate memory Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: How can I install a driver?
Yousef Raffah [EMAIL PROTECTED] wrote: I'm having an issue as I'm a newbie in installing/configuring the marvell driver for FreeBSD. A quick search in the mailing lists shows: http://www.freebsd.org/cgi/getmsg.cgi?fetch=2601224+2604070 +/usr/local/www/db/text/2006/freebsd-questions/20060402.freebsd-questions but I have no clue how I can bypass the second step, which is installing the if_myk.ko to /boot/kernel I have tried to cp if_yk.ko /boot/kernel/ but that didn't bring anything new in ifconfing! Try: kldxref /boot/kernel kldload if_yk Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: Prism wi support in 6.x - or alternative card
Brian Candler [EMAIL PROTECTED] wrote: I Hvae an IBM Thinkpad X30 with a miniPCI wireless card: wi0: Intersil Prism2.5 mem 0xf800-0xf8000fff irq 11 at device 2.0 on pci1 wi0: using RF:PRISM2.5 MAC:ISL3874A(Mini-PCI) wi0: Intersil Firmware: Primary (1.1.0), Station (1.4.9) wi0: Ethernet address: 00:05:3c:09:7e:9d wi0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps I have found it to be flaky under FreeBSD 5.4. It's OK for occasional use but when under heavy load, e.g. 'unison' syncing to another machine, it locks up: Mar 27 21:10:00 thinkdog kernel: wi0: timeout in wi_cmd 0x010b; event status 0xa000 Mar 27 21:10:00 thinkdog kernel: wi0: xmit failed Mar 27 21:10:04 thinkdog kernel: wi0: timeout in wi_cmd 0x0021; event status 0xa000 Mar 27 21:10:09 thinkdog kernel: wi0: wi_cmd: busy bit won't clear. At this point the only solution is to unload and reload the if_wi module. So my questions are: 1. Is support for this hardware significantly improved in 6.X? I don't think so. I have a wi card which worked fine in 5.4, but shows the symptoms above on 6.x. 2. If I were to buy another miniPCI card to replace it, what's the current recommendation? Something which works with ath. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: 6.0-REL problems with ISA ed0, FFS corruption and ancient hardware
Matt Emmerton [EMAIL PROTECTED] wrote: I recently upgraded a 4.11-REL machine to 6.0-REL and have run into some snags. While the installation from CD went fine, after configuring and enabling my ed0 NIC, bad things start to happen. FWIW, this machine is an ancient (hardware circa 1991, BIOS circa 1994) dual-Pentium 133 MHz machine, with EISA/PCI and onboard SCSI. At least it got lots of memory, last week I installed FreeBSD 6.1-PRERELEASE on a P90 with 16MB RAM. So far I can reliably reproduce two panics, one appears to be a ed driver bug (based on reports of similar panics with different NICs, notably nge) and one is a filesystem corruption problem. Here's the process that I go through to reliably reproduce both problems. 1) Boot machine in multi-user mode 2) After ifconfig ed0, machine panics with a trap 12 in ithread_loop. 3) In debugger, reset (or panic to get vmcore) 4) Reboot in multi-user mode, but set hint.ed.0.disabled=1 in the boot loader (to avoid ifconifg panic) 5) Root filesystem is fsckd; all other filesystems are scheduled for background fsck 6) Encounter panic ffs_valloc: dup alloc 7) In debugger, reset (or panic to get vmcore) Did you try to do a foreground fsck in single user mode? Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: wpa_supplicant with NDIS-wrapped wireless card and WPA-PSK reboots 6.1-pre
Carlos Amengual [EMAIL PROTECTED] wrote: My system is a 6.1-PRERELEASE as of yesterday afternoon, but the same happened with a RELENG_6 as of a month ago. I set up a D-Link AirPlus DWL-520+ wireless PCI adapter in an old server, and NDISwrapped it (got an AIRPLUS_SYS.ko). When running wpa_supplicant -dd -indis0 -Dndis -c/etc/wpa_supplicant.conf, the system reboots after printing: Initializing interface 'ndis0' conf '/etc/wpa_supplicant.conf' driver 'ndis' Configuration file '/etc/wpa_supplicant.conf' - '/etc/wpa_supplicant.conf' Reading configuration file '/etc/wpa_supplicant.conf' ctrl_interface='/var/run/wpa_supplicant' ctrl_interface_group=0 (from group name 'wheel') Line: 6 - start of a new network block ssid - hexdump_ascii(len=11): 47 4e 43 57 49 52 45 4c 45 53 53 GNCWIRELESS scan_ssid=1 (0x1) key_mgmt: 0x2 PSK (ASCII passphrase) - hexdump_ascii(len=26): [REMOVED] PSK (from passphrase) - hexdump(len=32): [REMOVED] Priority group 0 id=0 ssid='GNCWIRELESS' Initializing interface (2) 'ndis0' EAPOL: SUPP_PAE entering state DISCONNECTED EAPOL: KEY_RX entering state NO_KEY_RECEIVE EAPOL: SUPP_BE entering state INITIALIZE EAP: EAP entering state DISABLED EAPOL: External notification - portEnabled=0 EAPOL: External notification - portValid=0 NDIS: 1 adapter names found NDIS: 1 adapter descriptions found NDIS: 0 - ndis0 - ndis0 NDIS: Adapter description prefix 'ndis0' ndis_get_oid: oid=0xd010122 len (512) failed NDIS: verifying driver WPA capability NDIS: WPA key management supported NDIS: WPA-PSK key management supported ndis_set_oid: oid=0xd01011b len (4) failed NDIS: Failed to set OID_802_11_ENCRYPTION_STATUS (6) NDIS: TKIP encryption supported NDIS: driver supports WPA NDIS: driver capabilities: key_mgmt 0x5 enc 0x4 auth 0x3 Own MAC address: **:**:**:**:**:** wpa_driver_ndis_set_wpa: enabled=1 ndis_get_oid: oid=0xd010101 len (6) failed My /etc/wpa_supplicant.conf: ctrl_interface=/var/run/wpa_supplicant ctrl_interface_group=wheel # # home network; allow all valid ciphers network={ ssid=GNCWIRELESS scan_ssid=1 key_mgmt=WPA-PSK psk=** } Does it make a difference if you additionally put the bssid in /etc/wpa_supplicant.conf? Since I upgraded from RELENG_5 to RELENG_6 I have to use both the ssid and the bssid to get ndis0 to associate. I only use wep encryption and don't know if a failed attempt to associate with wpa_supplicant can cause a reboot, but it's worth a try. You should also check if you can associate to the (unencrypted) network with ifconfig by hand. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: device atapicam - causes huge slowdown
Adam Retter [EMAIL PROTECTED] wrote: FreeBSD funkalicious.home.dom 6.1-PRERELEASE FreeBSD 6.1-PRERELEASE #8: Thu Feb 23 23:24:57 GMT 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/funkalicious i386 I have a fairly straight-forward kernel config (see below) I think, yet if I enable device atapicam, and buildkernel and installkernel and reboot, the system starts up fine until it get's to finding disks and then it goes incredibly slowly, takes about 5 minutes to get to harvesting interupts and so on and so on, I think it will eventually get to the login prompt, but I havent been tolerant to wait that long 15 minutes. If I dont use device atapicam the system is perfect, but I could really do with enabling it, for CD/DVD writting purposes... If you don't use device atapicam you can kldload atapicam.ko later. You could try it to see if it makes a difference. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: top doesn't show any Process in idle-Mode
Michael Schuh [EMAIL PROTECTED] wrote: i use top mostly in idle-mode. # top return i or # top -I Under releng_6 (stable p4) and the older versions, i think down to releng_5, doesn't show a running process. By default top doesn't show system processes. If you run top -I and no process is shown, it means one of the system processes is running. Probably idle. Try top -I -S. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: Best release for IBM laptop R51
Graham North [EMAIL PROTECTED] wrote: I am planning to load FreeBSD as a dual boot on new IBM laptop. The model is an R51 which comes with: Radeon 7500 - video Intel Pro/1000 NT Mobile Intel Pro/Wireless 2200BG Integrated Audio Intel 82802 UltraATA Can anyone tell me whether the above hardware is all supported and stable in FreeBSD. I had RELENG_5 installed on my ThinkPad R51 UN0K6GE until two weeks ago when I switched to RELENG_6. It was stable with 5.4 and is stable now. I just updated to get some of the new features. Should I therefore download 6.0-Release and then just cvsup and rebuild?? Is this a better option than using 5.4 at this point? I'd skip 5.4. It was good, but 6.0 is even better. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
[Fixed] Re: ndis0 does not associate since update to RELENG_6
[EMAIL PROTECTED] (Bill Paul) wrote: Is there a way I can provide more information? You haven't said yet what manufacturer/model your access point is. It's a Netgear WGT624 (Hardwareversion V3H1/Firmwareversion V1.1.125_1.1.1GR). I tried to associate ndis0 with wi0 in hostap mode and got the same results. You also haven't said what Windows driver version you're using, but you need to cheat a bit to figure that out. I usually do: % strings -e l foo.sys Near the end of the output, there should be a bunch of version information, including the vendor name of whoever built the driver (in this case Intel). You might try downloading the latest driver from Intel. (They have a generic one for their Centrino wireless devices.) The old driver which was shipped with the Laptop: StringFileInfo 040904B0 Comments NDIS 5 Miniport Driver for Win2000 CompanyName Intel Corporation FileDescription Intel Wireless LAN Driver FileVersion 8010-28 Driver InternalName w22n50.SYS LegalCopyright Copyright Intel Corporation 2004 OriginalFilename w22n50.SYS ProductName Intel Wireless LAN Adapter VarFileInfo Translation The new one I downloaded from Intel today: StringFileInfo 040904B0 Comments NDIS 5.1 Miniport Driver CompanyName Intel Corporation FileDescription Intel Wireless LAN Driver FileVersion 9003-9 Driver InternalName w29n51.SYS LegalCopyright Copyright Intel Corporation 2004 OriginalFilename w29n51.SYS ProductName Intel Wireless LAN Adapter VarFileInfo Translation You also haven't said what sort of laptop this is. Wouldn't hurt to know that either. IBM ThinkPad R51 UN0K6GE. Unfortunately, this is the sort of thing that can only be debugged with the system sitting in front of me. I can't do it by remote control, and I can't know exactly what information to ask you. I have to experiment, and I can't do that from here. You should turn WEP off completely, make sure the AP is set for open authentication mode, and try getting it to authenticate without WEP first. It's one less variable to worry about. Try using the following: # ifconfig ndis0 ssid up # ifconfig ndis0 ssid yourssid bssid BSSID of your AP up Specifying the bssid is the solution. ifconfig ndis0 ssid ec60bfg3b4 bssid BSSID wepkey 1:0xWEPKEY\ deftxkey 1 wepmode on up Works with the new and the old driver and with both APs. [EMAIL PROTECTED] ~ $ifconfig ndis0 ndis0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 inet6 fe ... 7500%ndis0 prefixlen 64 scopeid 0x3 inet 192.168.0.32 netmask 0xff00 broadcast 192.168.0.255 ether 00 ... 00 media: IEEE 802.11 Wireless Ethernet autoselect (OFDM/54Mbps) status: associated ssid ec60bfg3b4 channel 11 bssid 00:... authmode OPEN privacy ON deftxkey 1 wepkey 1:104-bit txpowmax 100 protmode CTS Thanks for your time Bill. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
ndis0 does not associate since update to RELENG_6
I fail to get the following device working since my update from RELENG_5 to RELENG_6 a few days ago: [EMAIL PROTECTED]:2:0: class=0x028000 card=0x27128086 chip=0x42208086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = 'PRO/Wireless 2200BG Network Connection' class= network It worked fine with 5.4 and was recognised as ndis0: Intel(R) PRO/Wireless 2200BG Network Connection mem 0xc0214000-0xc0214fff irq 11 at device 2.0 on pci ndis0: NDIS API version: 5.0 ndis0: Ethernet address: 00: ... :00 ndis0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 5.5Mbps 11Mbps ndis0: 11g rates: 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps On 6.0 it is still recognised, but the last two lines are missing. I'm using GENERIC, the driver module was generated with ndisgen out of w22n51.inf and w22n50.sys. This is the combination I alway used. ndis0 can scan for access points, but can't associate with or without WEP encryption. [EMAIL PROTECTED] ~ $ifconfig ndis0 list scan SSIDBSSID CHAN RATE S:N INT CAPS ec60bfg3b4 00: ... :a8 11 54M 149:0 100 EP ??? ??? ??? ??? ??? ??? WME [EMAIL PROTECTED] ~ $ifconfig ndis0 ndis0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 inet 192.168.0.32 netmask 0xff00 broadcast 192.168.0.255 inet6 fe80: ... :7500%ndis0 prefixlen 64 scopeid 0x5 ether 00: ... :00 media: IEEE 802.11 Wireless Ethernet autoselect status: no carrier ssid channel 1 authmode OPEN privacy ON deftxkey 1 wepkey 1:104-bit txpowmax 100 protmode CTS I found a similar problem which should be fixed in current, but I don't know if the changes already hit stable. http://freebsd.rambler.ru/bsdmail/freebsd-current_2005/msg11802.html My problem is not exactly the same though, I have no trouble setting the bssid. Additionally I can't set the mode to 11g: [EMAIL PROTECTED] ~ #ifconfig ndis0 mode 11g ifconfig: SIOCSIFMEDIA (media): Invalid argument mode 11b is accepted but only leads to (DS/1Mbps). I can associate to the access point with ath0 and wi0 (at least for a short time). Is anybody else using this device with FreeBSD 6.0? Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: ndis0 does not associate since update to RELENG_6
Fabian Keil [EMAIL PROTECTED] wrote: I fail to get the following device working since my update from RELENG_5 to RELENG_6 a few days ago: [EMAIL PROTECTED]:2:0: class=0x028000 card=0x27128086 chip=0x42208086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = 'PRO/Wireless 2200BG Network Connection' class= network I found a work around. It still works with /usr/ports/net/iwi-firmware/: [EMAIL PROTECTED] ~ $ifconfig iwi0 iwi0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 inet 192.168.0.32 netmask 0xff00 broadcast 192.168.0.255 inet6 fe ... :7500%iwi0 prefixlen 64 scopeid 0x5 ether 00: ... :00 media: IEEE 802.11 Wireless Ethernet autoselect mode 11g (OFDM/48Mbps) status: associated ssid ec60bfg3b4 channel 11 bssid 00:...:a8 authmode OPEN privacy ON deftxkey 1 wepkey 1:104-bit txpowmax 100 protmode CTS bintval 100 Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: ndis0 does not associate since update to RELENG_6
Parv [EMAIL PROTECTED] wrote: in message [EMAIL PROTECTED], wrote Fabian Keil thusly... Fabian Keil [EMAIL PROTECTED] wrote: I fail to get the following device working since my update from RELENG_5 to RELENG_6 a few days ago: [EMAIL PROTECTED]:2:0: class=0x028000 card=0x27128086 chip=0x42208086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = 'PRO/Wireless 2200BG Network Connection' class= network I found a work around. It still works with /usr/ports/net/iwi-firmware/: I also found the same about ndis driver. I was not even able to assign a ssid, mode, or a channel to a ndis0 interface. At least net/iwi-firmware works w/ WPA (even if the interface causes freeze after waking up from long sleep on IBM Thinkpad T42; Did you try to load and unload if_iwi.ko in /etc/rc.resume and /etc/rc.suspend? BTW, when i read that you found a work around, i was expecting a work around to make ndis work. I'm sorry for my misleading wording then. Of course it's just a work around to get the PRO/Wireless 2200BG working at all. It's just that I had forgotten about the existence of iwi. The last days I was using em0 to connect my Laptop to the network. Getting if_iwi to work after my initial posting was a relief. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: ndis0 does not associate since update to RELENG_6
Parv [EMAIL PROTECTED] wrote: in message [EMAIL PROTECTED], wrote Fabian Keil thusly... Fabian Keil [EMAIL PROTECTED] wrote: I fail to get the following device working since my update from RELENG_5 to RELENG_6 a few days ago: [EMAIL PROTECTED]:2:0: class=0x028000 card=0x27128086 chip=0x42208086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = 'PRO/Wireless 2200BG Network Connection' class= network I also found the same about ndis driver. I was not even able to assign a ssid, mode, or a channel to a ndis0 interface. I forgot to confirm that I can't assign ssid and channel as well. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: ndis0 does not associate since update to RELENG_6
[EMAIL PROTECTED] (Bill Paul) wrote: I fail to get the following device working since my update from RELENG_5 to RELENG_6 a few days ago: [EMAIL PROTECTED]:2:0: class=0x028000 card=0x27128086 chip=0x42208086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = 'PRO/Wireless 2200BG Network Connection' class= network It worked fine with 5.4 and was recognised as ndis0: Intel(R) PRO/Wireless 2200BG Network Connection mem 0xc0214000-0xc0214fff irq 11 at device 2.0 on pci ndis0: NDIS API version: 5.0 ndis0: Ethernet address: 00: ... :00 ndis0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 5.5Mbps 11Mbps ndis0: 11g rates: 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps On 6.0 it is still recognised, but the last two lines are missing. That's normal. I'm using GENERIC, the driver module was generated with ndisgen out of w22n51.inf and w22n50.sys. This is the combination I alway used. ndis0 can scan for access points, but can't associate with or without WEP encryption. What command do you type to try to get it to associate? kldload wlan_wep.ko kldload w22n50_sys.ko ifconfig ndis0 ssid ec60bfg3b4 wepkey 1:0xhexkey \ deftxkey 1 wepmode on ifconfig ndis0 inet 192.168.0.32 up [EMAIL PROTECTED] ~ $ifconfig ndis0 list scan SSIDBSSID CHAN RATE S:N INT CAPS ec60bfg3b4 00: ... :a8 11 54M 149:0 100 EP ??? ??? ??? ??? ??? ??? WME [EMAIL PROTECTED] ~ $ifconfig ndis0 ndis0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 inet 192.168.0.32 netmask 0xff00 broadcast 192.168.0.255 inet6 fe80: ... :7500%ndis0 prefixlen 64 scopeid 0x5 ether 00: ... :00 media: IEEE 802.11 Wireless Ethernet autoselect status: no carrier ssid channel 1 authmode OPEN privacy ON deftxkey 1 wepkey 1:104-bit txpowmax 100 protmode CTS I found a similar problem which should be fixed in current, but I don't know if the changes already hit stable. http://freebsd.rambler.ru/bsdmail/freebsd-current_2005/msg11802.html My problem is not exactly the same though, I have no trouble setting the bssid. You should be able to do: # ifconfig ndis0 ssid ec60bfg3b4 wepmode on wepkey 0123456789123 up It was my experience that ifconfig on 6.0 will not chose the first key by default. I always have to add deftxkey 1. I can't use your exact command because I know my wepkey only in hexadecimal. But if I disable WEP in the access point and use ifconfig ndis0 ssid ec60bfg3b4 up it fails to associate (or even to set the ssid) as well: [EMAIL PROTECTED] ~ #ifconfig ndis0 ndis0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 inet6 fe ... 7500%ndis0 prefixlen 64 scopeid 0x3 ether 00: ... :00 media: IEEE 802.11 Wireless Ethernet autoselect status: no carrier ssid channel 1 authmode OPEN privacy OFF txpowmax 100 protmode CTS You don't state what command you actually use. You should have specified it in your e-mail. Note that usually the WEP key has to be either 5 or 13 characters. You're right, sorry. I use the hexadecimal notation and my key is correctly recognised as 104-bit. ifconfig ndis0 ssid ec60bfg3b4 wepkey 1:0xhexkey \ deftxkey 1 wepmode on ifconfig ndis0 inet 192.168.0.32 up The two commands above work for iwi0, wi0 and ath0. I use the same shell script I used on 5.4. The only change I made was adding deftxkey 1 which wasn't needed before. Is anybody else using this device with FreeBSD 6.0? I've tested the 2200BG myself with the NDISulator 6.0 and I've been able to get it to associate with 11g networks. I don't know what's wrong in your case. Is there a way I can provide more information? Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: wi0 unreliable on FreeBSD 6.0
Kevin Oberman [EMAIL PROTECTED] wrote: Date: Sun, 15 Jan 2006 22:32:08 +0100 From: Fabian Keil [EMAIL PROTECTED] Since the update from RELENG_5 to RELENG_6 a few days ago I have trouble with the wireless network. This card worked fine with FreeBSD 5.4: wi0: T-Sinus 130card at port 0x4000-0x403f irq 11 function 0 config 1 on pccard0 wi0: using RF:PRISM2.5 MAC:ISL3873 wi0: Intersil Firmware: Primary (1.0.4), Station (1.2.0) I notice that your firmware is pretty old. I am running Primary (1.1.1), Station (1.7.4) and don't seem to be having any serious problems. I'd suggest updating and see if that fixes things. Thanks for the tip. Today I failed to get the right firmware files, but I'll try again tomorrow. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
wi0 unreliable on FreeBSD 6.0
Since the update from RELENG_5 to RELENG_6 a few days ago I have trouble with the wireless network. This card worked fine with FreeBSD 5.4: wi0: T-Sinus 130card at port 0x4000-0x403f irq 11 function 0 config 1 on pccard0 wi0: using RF:PRISM2.5 MAC:ISL3873 wi0: Intersil Firmware: Primary (1.0.4), Station (1.2.0) But only works with very low traffic on FreeBSD 6.0. I can use it to check my emails and to flood ping for a while: --- 192.168.0.1 ping statistics --- 58577 packets transmitted, 57031 packets received, 2% packet loss round-trip min/avg/max/stddev = 2.934/700.012/1169.922/327.406 ms But as soon as I open firefox, which then tries to get some RSS feeds, I loose the connection. If I have firefox already open I can sometimes get the first half of a small web page, but only sometimes. The ifconfig wi0 output is then shortened to: wi0: flags=8807UP,BROADCAST,DEBUG,SIMPLEX,MULTICAST mtu 1500 inet 192.168.0.51 netmask 0xff00 broadcast 192.168.0.255 inet6 fe80::230:f1ff:fe66:d97e%wi0 prefixlen 64 scopeid 0x3 ether 00:30:f1:66:d9:7e instead of wi0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500 inet 192.168.0.51 netmask 0xff00 broadcast 192.168.0.255 inet6 fe80::230:f1ff:fe66:d97e%wi0 prefixlen 64 scopeid 0x3 ether 00:30:f1:66:d9:7e media: IEEE 802.11 Wireless Ethernet autoselect (DS/2Mbps) status: associated ssid ec60bfg3b4 channel 11 bssid 00:14:6c:1b:62:a8 stationname FreeBSD WaveLAN/IEEE node authmode OPEN privacy MIXED deftxkey 1 wepkey 1:104-bit txpowmax 100 After ifconfig wi0 debug dmesg says: wi0: timeout in wi_cmd 0x0002; event status 0x8008 wi0: timeout in wi_cmd 0x; event status 0x8008 wi0: wi_cmd: busy bit won't clear. wi0: init failed wi0: failed to allocate 2372 bytes on NIC wi0: tx buffer allocation failed (error 12) wi0: interface not running wi0: link state changed to DOWN If I unload if_wi and wlan_wep, remove the card, put it in again and reload if_wi and wlan_wep, I can reconfigure the card and ping some more. I use wlan and wlan_wep as modules, my setup works fine with an Atheros-based card. I noticed the mails that wi0 is regarded as old technology and therefore will not be enhanced to support WPA in the next time, but it should still work as reliable as on 5.4, right? Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: NFS UDP mounts on RELENG_6?
Oliver Brandmueller [EMAIL PROTECTED] wrote: On Fri, Dec 16, 2005 at 04:30:31PM +0100, Fabian Keil wrote: Oliver Brandmueller [EMAIL PROTECTED] wrote: I'm experiencing problems when trying to mount NFS filesystems from a RELENG_6 server (FreeBSD hudson 6.0-STABLE FreeBSD 6.0-STABLE #0: Wed Dec 14 16:59:55 CET 2005 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/NFS-32-FBSD6 i386) to either 5.4-STABLE or 6-STABLE clients. mounting works fine, but afterwards the access to the filesystem on the client stalls. As soon as I mount the FS with a TCP mount everything works as expected. The mounts worked fine on UDP when the server was 5.4-STABLE. There is just a plain GigE switch involved, no firewalls or routing. Anyone else experiencing those problems or having an idea? I just copied some files (200 MB) from a NFS Server running FreeBSD africanqueen.local 6.0-STABLE FreeBSD 6.0-STABLE #5: Thu Dec 15 19:31:12 CET 2005 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/AFRICANQUEEN i386 without problems. My client runs FreeBSD 5.4, I use GigE as well, but no switch. Which kind GigE Interface do you use? Client: [EMAIL PROTECTED] ~ $pciconf -lv| grep em0 -A 2 [EMAIL PROTECTED]:1:0: class=0x02 card=0x05491014 chip=0x101e8086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82540EP Gigabit Ethernet Controller (Mobile)' Server: [EMAIL PROTECTED] ~ $pciconf -lv| grep re[01] -A 2 [EMAIL PROTECTED]:9:0: class=0x02 card=0x816910ec chip=0x816910ec rev=0x10 hdr=0x00 vendor = 'Realtek Semiconductor' device = 'RTL8169 Gigabit Ethernet Adapter' -- [EMAIL PROTECTED]:10:0: class=0x02 card=0x601b182d chip=0x816910ec rev=0x10 hdr=0x00 vendor = 'Realtek Semiconductor' device = 'RTL8169 Gigabit Ethernet Adapter' re0 is made by Vivanco, re1 is a Sitecom card. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: NFS UDP mounts on RELENG_6?
Oliver Brandmueller [EMAIL PROTECTED] wrote: I'm experiencing problems when trying to mount NFS filesystems from a RELENG_6 server (FreeBSD hudson 6.0-STABLE FreeBSD 6.0-STABLE #0: Wed Dec 14 16:59:55 CET 2005 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/NFS-32-FBSD6 i386) to either 5.4-STABLE or 6-STABLE clients. mounting works fine, but afterwards the access to the filesystem on the client stalls. As soon as I mount the FS with a TCP mount everything works as expected. The mounts worked fine on UDP when the server was 5.4-STABLE. There is just a plain GigE switch involved, no firewalls or routing. Anyone else experiencing those problems or having an idea? I just copied some files (200 MB) from a NFS Server running FreeBSD africanqueen.local 6.0-STABLE FreeBSD 6.0-STABLE #5: Thu Dec 15 19:31:12 CET 2005 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/AFRICANQUEEN i386 without problems. My client runs FreeBSD 5.4, I use GigE as well, but no switch. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: FreeBSD 6.0 panic: kmem_malloc(16384): kmem_map too small: 172728320 total allocated [solved]
Kris Kennaway [EMAIL PROTECTED] wrote: On Wed, Dec 14, 2005 at 05:32:34PM +0100, Fabian Keil wrote: I guess you're right. I can fill a 256MB swap-backed disk without panic and without swapping. FYI, this is documented in the manpage. I think the panic potential should be mentioned in md(4) as well. I used a script not written by me, the commands used were working and after the panic I only read man md. Of course mdconfig(8) is mentioned twice, but I didn't think I needed more information about it. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Re: Slightly OT, getting errors from members on this list
Morten A. Middelthon [EMAIL PROTECTED] wrote: I just got this message after posting to freebsd-stable@freebsd.org: Subject: Blogger post failed From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Date: Thu, 15 Dec 2005 05:32:36 -0800 (PST) Blogger does not accept multipart/signed files. Error code: 7.774C07 You are not alone. http://freebsd.rambler.ru/bsdmail/freebsd-stable_2005/msg08530.html Quite annoying. And to be allowed to complain, you need a blogger account: http://www.blogger.com/problem.g Do no evil my ass. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
FreeBSD 6.0 panic: kmem_malloc(16384): kmem_map too small: 172728320 total allocated
I triggered a few reproducible panics on FreeBSD 6.0-STABLE. I created a ramdisk with: /sbin/mdconfig -a -t malloc -s 256M -u 10 /sbin/newfs -U /dev/md10 /sbin/mount /dev/md10 /mnt/ramdisk The system has avail memory = 515932160 (492 MB) and 1GB swap space. While copying to /mnt/ramdisk trough ftp localhost it got: [EMAIL PROTECTED] ~/crashdump #kgdb kernel-GENERIC.debug vmcore.3 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol ps_pglobal_lookup] GNU gdb 6.1.1 [FreeBSD] [...] Unread portion of the kernel message buffer: panic: kmem_malloc(16384): kmem_map too small: 172728320 total allocated Uptime: 2m57s Dumping 511 MB (2 chunks) chunk 0: 1MB (158 pages) ... ok chunk 1: 511MB (130800 pages) 495 479 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15 #0 doadump () at pcpu.h:165 165 pcpu.h: No such file or directory. in pcpu.h (kgdb) where #0 doadump () at pcpu.h:165 #1 0xc063a4ee in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:399 #2 0xc063a784 in panic (fmt=0xc0880846 kmem_malloc(%ld): kmem_map too small: %ld total allocated) at /usr/src/sys/kern/kern_shutdown.c:555 #3 0xc07a44bd in kmem_malloc (map=0xc10430c0, size=16384, flags=1026) at /usr/src/sys/vm/vm_kern.c:299 #4 0xc079c0c6 in page_alloc (zone=0x0, bytes=16384, pflag=0x0, wait=1026) at /usr/src/sys/vm/uma_core.c:958 #5 0xc079e41f in uma_large_malloc (size=16384, wait=1026) at /usr/src/sys/vm/uma_core.c:2702 #6 0xc0630085 in malloc (size=16384, mtp=0xc08ffe40, flags=1026) at /usr/src/sys/kern/kern_malloc.c:329 #7 0xc078365e in softdep_disk_io_initiation (bp=0xcd899658) at /usr/src/sys/ufs/ffs/ffs_softdep.c:3630 #8 0xc078b1fe in ffs_geom_strategy (bo=0xc3593e90, bp=0xcd899658) at buf.h:422 #9 0xc0796869 in ufs_strategy (ap=0x0) at /usr/src/sys/ufs/ufs/ufs_vnops.c:1926 #10 0xc081c645 in VOP_STRATEGY_APV (vop=0xc09012a0, a=0xdd93ec0c) at vnode_if.c:1796 #11 0xc06841d0 in bufstrategy (bo=0xc35f7720, bp=0x0) at vnode_if.h:928 #12 0xc067eda8 in bufwrite (bp=0xcd899658) at buf.h:415 #13 0xc067f397 in bawrite (bp=0x0) at buf.h:399 #14 0xc078b53d in ffs_syncvnode (vp=0xc35f7660, waitfor=1) at /usr/src/sys/ufs/ffs/ffs_vnops.c:256 #15 0xc078b28e in ffs_fsync (ap=0xdd93ecc0) at /usr/src/sys/ufs/ffs/ffs_vnops.c:179 #16 0xc081c05c in VOP_FSYNC_APV (vop=0x0, a=0x0) at vnode_if.c:1020 #17 0xc0698278 in fsync (td=0xc3460d80, uap=0x0) at vnode_if.h:537 #18 0xc080b6eb in syscall (frame= {tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 64, tf_esi = 134572032, tf_ebp = -1077940680, tf_isp = -5775079 96, tf_ebx = 134561920, tf_edx = 1, tf_ecx = 6, tf_eax = 95, tf_trapno = 0, tf_err = 2, tf_eip = 672366947, tf_cs = 51, tf_eflags = 662, tf_esp = -1077945572, tf_ss = 59}) at /usr/src/sys/i386/i386/trap.c:981 #19 0xc07fa57f in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:200 #20 0x0033 in ?? () Previous frame inner to this frame (corrupt stack?) By simply copying to /mnt/ramdisk with cp I got: [EMAIL PROTECTED] ~/crashdump #kgdb kernel-GENERIC.debug vmcore.4 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol ps_pglobal_lookup] GNU gdb 6.1.1 [FreeBSD] [...] Unread portion of the kernel message buffer: g_vfs_done():md10[WRITE(offset=206372864, length=131072)]error = 28 g_vfs_done():md10[WRITE(offset=206503936, length=131072)]error = 28 g_vfs_done():md10[WRITE(offset=206635008, length=131072)]error = 28 g_vfs_done():md10[WRITE(offset=206766080, length=131072)]error = 28 g_vfs_done():md10[WRITE(offset=206897152, length=131072)]error = 28 g_vfs_done():md10[WRITE(offset=207028224, length=131072)]error = 28 g_vfs_done():md10[WRITE(offset=207159296, length=131072)]error = 28 g_vfs_done():md10[WRITE(offset=207290368, length=131072)]error = 28 g_vfs_done():md10[WRITE(offset=207421440, length=131072)]error = 28 g_vfs_done():md10[WRITE(offset=207552512, length=131072)]error = 28 g_vfs_done():md10[WRITE(offset=207683584, length=131072)]error = 28 g_vfs_done():md10[WRITE(offset=207814656, length=131072)]error = 28 g_vfs_done():md10[WRITE(offset=207945728, length=131072)]error = 28 g_vfs_done():md10[WRITE(offset=208076800, length=131072)]error = 28 g_vfs_done():md10[WRITE(offset=208207872, length=131072)]error = 28 g_vfs_done():md10[WRITE(offset=208338944, length=131072)]error = 28 panic: kmem_malloc(4096): kmem_map too small: 172728320 total allocated Uptime: 11m23s Dumping 511 MB (2 chunks) chunk 0: 1MB (158 pages) ... ok chunk 1: 511MB (130800 pages) 495 479 463 447 431 415 399 383 367 351 335 319 303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15 #0 doadump () at pcpu.h:165 165 pcpu.h: No such file or directory. in pcpu.h #0 doadump () at pcpu.h:165 #1 0xc063a4ee in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:399 #2 0xc063a784 in panic (fmt=0xc0880846 kmem_malloc(%ld): kmem_map too
Re: FreeBSD 6.0 panic: kmem_malloc(16384): kmem_map too small: 172728320 total allocated
Gleb Smirnoff [EMAIL PROTECTED] wrote: On Wed, Dec 14, 2005 at 01:25:30PM +0100, Fabian Keil wrote: F I triggered a few reproducible panics on FreeBSD 6.0-STABLE. F F I created a ramdisk with: F F /sbin/mdconfig -a -t malloc -s 256M -u 10 F /sbin/newfs -U /dev/md10 F /sbin/mount /dev/md10 /mnt/ramdisk F F The system has avail memory = 515932160 (492 MB) F and 1GB swap space. F F While copying to /mnt/ramdisk trough ftp localhost F it got: This usually exposes some memory leak in kernel. Can you please do the following - copy some amount of data to /mnt/ramdisk trough ftp localhost, and cancel the operation before it panics. Then run vmstat -m and vmstat -z, to determine what kind of memory allocation is leaking. I had loops with vmstat -m and vmstat -z in the background while copying to /mnt/ramdisk. The last output before the panic was: Type InUse MemUse HighUse Requests Size(s) DEVFS22 1K - 23 16,128 pfs_nodes20 3K - 20 128 GEOM 18926K - 858 16,32,64,128,256,512,1024,2048,4096 isadev17 2K - 17 64 ATA DMA 4 1K -4 128 cdev27 4K - 27 128 AR driver 0 0K - 11 512,2048 ACD driver 3 6K -3 2048 file desc 12046K - 1611 16,32,256,512,2048 sigio 2 1K -3 32 kenv96 7K - 97 16,32,64,4096 kqueue 0 0K - 62 256,1024 proc-args43 2K - 797 16,32,64,128 zombie 0 0K - 907 128 ithread48 5K - 49 64,128 KTRACE 10013K - 100 128 CAM SIM 1 1K -1 64 linker68 3K - 99 16,32,256 CAM XPT10 1K - 17 16,64,512 lockf 3 1K -3 64 devbuf 1346 3177K - 1816 16,32,64,128,256,512,1024,2048,4096 temp16 171K - 6266 16,32,64,128,256,512,1024,2048,4096 ip6opt 1 1K - 1 128 ip6ndp 6 1K -7 64,128 module 37124K - 371 64,128 mtx_pool 1 8K -1 pgrp36 3K - 623 64 session29 4K - 47 128 proc 2 4K -2 2048 subproc 209 413K - 1116 256,4096 cred35 5K - 4132 128 plimit18 5K - 400 256 uidinfo 4 1K - 20 32,512 sysctl 0 0K - 619 16,32,64 sysctloid 256777K - 2567 16,32,64 sysctltmp 0 0K - 280 16,32,128 umtx 120 8K - 120 64 SWAP 2 141K -2 64 bus 95938K - 3599 16,32,64,128,1024 bus-sc5727K - 1537 16,32,64,128,256,512,1024,2048,4096 devstat1837K - 18 16,4096 eventhandler37 3K - 37 32,128 kobj 248 496K - 299 2048 MD disk 294 7K - 294 16,2048 MD sectors 293 1172K - 293 4096 rman 14910K - 570 16,64 sbuf 0 0K - 440 16,32,64,128,256,512,1024,2048,4096 sleep queues 121 4K - 121 32 taskqueue 6 1K -6 128 turnstiles 121 8K - 121 64 Unitno 7 1K -9 16,64 ioctlops 0 0K - 2757 16,32,64,256,512,1024,4096 iov 0 0K - 487 16,64,128 msg 425K -4 1024,4096 sem 4 7K -4 512,1024,4096 shm 112K -1 ttys 1228 174K - 3223 128,1024 ptys 3 1K -3 128 mbuf_tag 0 0K -6 32,64 soname 6 1K - 735 16,32,128 pcb29 5K - 81 16,32,64,2048 BIO buffer 0 0K - 99 2048 vfscache 1 256K -1 cluster_save buffer 0 0K - 19 32,64 Export Host 1 1K -2 256 VFS hash 1 128K -1 vnodes 1 1K -1 128 mount 13012K - 641 16,32,64,128,512,1024,2048 CAM periph 1 1K -1 128 BPF 4 1K -4 64 ifnet 5 5K -5 256,1024 ifaddr4010K - 40 16,32,64,256,512,2048 ether_multi40 2K - 46 16,32,64 clone 416K -4 4096 arpcom 2 1K
Re: FreeBSD 6.0 panic: kmem_malloc(16384): kmem_map too small: 172728320 total allocated [solved]
Scott Long [EMAIL PROTECTED] wrote: Gleb Smirnoff wrote: On Wed, Dec 14, 2005 at 01:25:30PM +0100, Fabian Keil wrote: F I triggered a few reproducible panics on FreeBSD 6.0-STABLE. F F I created a ramdisk with: F F /sbin/mdconfig -a -t malloc -s 256M -u 10 F /sbin/newfs -U /dev/md10 F /sbin/mount /dev/md10 /mnt/ramdisk F F The system has avail memory = 515932160 (492 MB) F and 1GB swap space. F F While copying to /mnt/ramdisk trough ftp localhost F it got: This usually exposes some memory leak in kernel. Can you please do the following - copy some amount of data to /mnt/ramdisk trough ftp localhost, and cancel the operation before it panics. Then run vmstat -m and vmstat -z, to determine what kind of memory allocation is leaking. While it can mean a memory leak in the kernel, I don't think that's the case here. On i386, only 320MB can be allocated to kernel malloc memory. Much of this space can get consumed with vnodes and other filesystem structures, so trying to allocate 256MB to a ramdisk is likely putting you over the max. I'd suggest instead to use a swap-back disk. It doesn't necessarily mean that the ramdisk pages will live in swap, it just means that they will get managed directly in the bufcache, eliminating the 320MB restriction. I guess you're right. I can fill a 256MB swap-backed disk without panic and without swapping. Before ftp localhost: last pid: 652; load averages: 0.02, 0.09, 0.07up 0+00:07:16 17:12:05 37 processes: 1 running, 36 sleeping CPU states: 0.0% user, 0.0% nice, 0.0% system, 0.4% interrupt, 99.6% idle Mem: 11M Active, 12M Inact, 18M Wired, 11M Buf, 453M Free Swap: 999M Total, 999M Free After ftp localhost: last pid: 666; load averages: 0.20, 0.12, 0.08up 0+00:09:05 17:13:54 36 processes: 1 running, 35 sleeping CPU states: 0.0% user, 0.0% nice, 0.0% system, 0.4% interrupt, 99.6% idle Mem: 244M Active, 150M Inact, 73M Wired, 27M Cache, 60M Buf, 984K Free Swap: 999M Total, 999M Free After removal of the swap-backed disk: last pid: 690; load averages: 0.00, 0.01, 0.03up 0+00:17:53 17:22:42 34 processes: 1 running, 33 sleeping CPU states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle Mem: 15M Active, 76M Inact, 43M Wired, 13M Cache, 60M Buf, 347M Free Swap: 999M Total, 999M Free Thanks for your time Gleb and Scott. Fabian -- http://www.fabiankeil.de/ signature.asc Description: PGP signature
Missing wep_wlan at 5.4 (was: Atheros (ath0) no RX traffic)
Sam Leffler [EMAIL PROTECTED] wrote: Stephen Montgomery-Smith wrote: Richard Arends wrote: Today I upgraded my laptop from 5-STABLE to 6-STABLE. After the upgrade, my wireless is not working anymore. You are doing better than me. I try this: ifconfig ath0 wepkey 12345 and get ifconfig: SIOCS80211: Invalid argument (Actually maybe that is happening to you as well, but since you are setting ifconfig_ath0 from within rc.conf, you might be missing this error message as it flies by in your start up.) I get this error on other wireless cards as well. kldload wlan_wep Since a few days I get ifconfig: SIOCS80211: Invalid argument while trying to set up wep with up to date ndis stuff on 5.4. ATM I use an older ndis build which still works. wlan_wep seems to exist at 6.0 only: http://fxr.watson.org/fxr/source/modules/wlan_wep/?v=RELENG54 http://fxr.watson.org/fxr/source/modules/wlan_wep/?v=RELENG6 Is there some secret I don't know about? Fabian -- http://www.fabiankeil.de/ pgp6YHwsfXpfs.pgp Description: PGP signature
Re: ndisgen intended to be the only way to generate ndis?
Daniel O'Connor [EMAIL PROTECTED] wrote: On Tue, 5 Jul 2005 22:49, Fabian Keil wrote: AFAIK, nobody has announced that the old way is death, therefore I would like to know if the breakage is intentional and if it is, if there's a technical reason why these methods can no longer coexists. The old way built the .sys and .inf files into a .ko along with if_ndis code. In the new way you build the .sys and .inf files into a .ko without any other code. When you load it, it pulls in if_ndis which then reads the wrapped .sys and .inf file you loaded. You can't build things the old way any more because the if_ndis code no longer expects to be linked to a .sys file. Thanks for pointing this out. I have missed this design change completely. I suggest the best approach would be to submit improved documentation for the ndiscvt man page (and a new ndisgen page) along with some handbook changes. It would also be fairly trivial to modify ndisgen to take some arguments. Agreed. Fabian -- http://www.fabiankeil.de/ pgpbzxtmytodR.pgp Description: PGP signature
ndisgen intended to be the only way to generate ndis?
Hi all, as you probably have noticed, the amount of mails about problems with compiling ndis has increased in the last four weeks. The old way to compile ndis was to go to /usr/src/sys/modules/if_ndis/, use ndiscvt to create a header file containing the windows driver and to make;make install. It was fast and well documented in the handbook and on the web in general. Later Bill Paul wrote /usr/sbin/ndisgen to automate these steps. ndisgen is an interactive shell script, it is user friendly and describes what it's doing. However, using it is slower than the old way was. You can't use shell auto completion to specify the location of the drivers sys and inf files, some steps are done, even if they aren't needed each time you recompile ndis. ATM the existence of ndisgen is poorly documented. It's not mentioned in the handbook, not in the man pages and seldom appears on other websites. If you don't read the mailing lists or the cvs logs, you probably won't know about it. For a while the new and the old way coexisted, everybody was happy. Since perhaps four weeks, the old way stopped working for many (all?) people. You can still build and kldload the needed modules without error, but they will not work. Most of the time (every time?) ndisgen still does. AFAIK, nobody has announced that the old way is death, therefore I would like to know if the breakage is intentional and if it is, if there's a technical reason why these methods can no longer coexists. Mark A-J. Raught wrote on freebsd-mobile yesterday: I prefer the old way, but as long as it works I'll suffer through the wizard feel. So do I, I guess we're not alone. Fabian -- http://www.fabiankeil.de/ pgpjdKg9wEYXe.pgp Description: PGP signature
Fw: 5.4-STABLE panic: kernel trap 12 with interrupts diabled
Hi list, forwarding to freebsd-stable (probably the right place anyway), since I got no further responses on freebsd-questions. Subhro [EMAIL PROTECTED] wrote: On 5/5/2005 19:43, Fabian Keil wrote: the day before yesterday I experienced my first panic on 5.4-STABLE. Build and cvsup'ed last Friday. My system is a ThinkPad R51 I did nothing spectacular, after boot I: logged in as user cdrecord -scanbus (which didn't work as I hadn't yet set it suid) su chmod +x for cdrecord and readcd (meant was +g ;-) exit cdrecord -scanbus (didn't yet work ;-) su cdrecord -scanbus (did work) readcd dev=2,0,0 -factor meshpoints=100 f=./file exit Then I moved the laptop and plugged in the AC/DC adapter. whoami brought me: Kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode fault virtual address= 0xa94d06c fault code = supervisor read, page not present instruction pointer = 0x8:0xc053cbe5 stack pointer = 0x10:0xe669f98c frame pointer= 0x10:0xe669f990 code segment = base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= resume, IOPL = 0 current process = 601 (whoami) trap number = 12 panic: page fault I saved the dump manually with savecore and then tried to follow: http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/advanced.html#KERNEL-PANIC-TROUBLESHOOTING [EMAIL PROTECTED] ~ $nm -n /boot/kernel/kernel | grep c053cb c053cb4c T init_turnstiles c053cbc9 t init_turnstile0 c053cbd8 t turnstile_setowner My kernel contains makeoptions DEBUG=-g, however I don't have the file /sys/compile/KERNELCONFIG/kernel.debug and thus wasn't able to do % gdb -k /sys/compile/KERNELCONFIG/kernel.debug /var/crash/vmcore.0 It turned out that I just was looking at the wrong places, kernel.debug was found at /usr/obj/usr/src/sys/THINKPAD/kernel.debug. http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-gdb.html fits better and contains a pointer to kgdb. [EMAIL PROTECTED] ~ $cat info.0 Dump header from device /dev/ad0s3b Architecture: i386 Architecture Version: 16777216 Dump Length: 536215552B (511 MB) Blocksize: 512 Dumptime: Tue May 3 20:18:11 2005 Hostname: r51.local Magic: FreeBSD Kernel Dump Version String: FreeBSD 5.4-STABLE #6: Sat Apr 30 14:57:04 CEST 2005 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/THINKPAD Panic String: page fault Dump Parity: 1084811848 Bounds: 0 Dump Status: good The kernel was build the new way. I was not able to reproduce the panic. Is there anything else I can do? It would be great to have a look at the core. Can you put it up somewhere on the WEB? Also if you are not running a GENERIC kernel then let us have a look at the config file. [EMAIL PROTECTED] ~ $ls -lh|grep core -rw--- 1 fk wheel 511M May 3 20:38 vmcore.0 -rw--- 1 fk wheel 354M May 5 19:11 vmcore.0.gz I don't have that much web space available. However the following seems to be interesting: [EMAIL PROTECTED] ~ $kgdb kernel.debug vmcore.0 [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol ps_pglobal_lookup] GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i386-marcel-freebsd. #0 doadump () at pcpu.h:160 160 pcpu.h: No such file or directory. in pcpu.h (kgdb) where #0 doadump () at pcpu.h:160 #1 0xc0519e76 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:410 #2 0xc051a1a7 in panic (fmt=0xc06bafe5 %s) at /usr/src/sys/kern/kern_shutdown.c:566 #3 0xc0693758 in trap_fatal (frame=0xe669f94c, eva=0) at /usr/src/sys/i386/i386/trap.c:809 #4 0xc0692dca in trap (frame= {tf_fs = 24, tf_es = 16, tf_ds = 16, tf_edi = -429261692, tf_esi = -1043159552, tf_ebp = -429262448, tf_isp = -429262472, tf_ebx = -1043640192, tf_edx = -1043640192, tf_ecx = 177524736, tf_eax = 177524736, tf_trapno = 12, tf_err = 0, tf_eip = -1068250139, tf_cs = 8, tf_eflags = 65539, tf_esp = -1043159552, tf_ss = -429262416}) at /usr/src/sys/i386/i386/trap.c:247 #5 0xc0681baa in calltrap () at /usr/src/sys/i386/i386/exception.s:140 #6 0x0018 in ?? () #7 0x0010 in ?? () #8 0x0010 in ?? () #9 0xe669fc84 in ?? () #10 0xc1d2a600 in ?? () #11 0xe669f990 in ?? () #12 0xe669f978 in ?? () #13 0xc1cb5080 in ?? () #14 0xc1cb5080 in ?? () #15 0x0a94d000 in ?? () #16 0x0a94d000 in ?? () #17 0x000c in ?? () #18 0x in ?? () #19 0xc053cbe5 in turnstile_setowner (ts=0xc1cb5080, owner=0x0) at /usr/src/sys/kern/subr_turnstile.c:367 #20