from:"Fabian Keil"

Re: a strange and terrible saga of the cursed iSCSI ZFS SAN

2017-08-05 Thread Fabian Keil

"Eugene M. Zheganin"  wrote:

> On 05.08.2017 22:08, Eugene M. Zheganin wrote:
> >
> >   pool: userdata
> >  state: ONLINE
> > status: One or more devices has experienced an error resulting in data
> > corruption.  Applications may be affected.
> > action: Restore the file in question if possible.  Otherwise restore the
> > entire pool from backup.
> >see: http://illumos.org/msg/ZFS-8000-8A
> >   scan: none requested
> > config:
> >
> > NAME   STATE READ WRITE CKSUM
> > userdata   ONLINE   0 0  216K
> >   mirror-0 ONLINE   0 0  432K
> > gpt/userdata0  ONLINE   0 0  432K
> > gpt/userdata1  ONLINE   0 0  432K  
> That would be funny, if not that sad, but while writing this message, 
> the pool started to look like below (I just asked zpool status twice in 
> a row, comparing to what it was):
> 
> [root@san1:~]# zpool status userdata
>pool: userdata
>   state: ONLINE
> status: One or more devices has experienced an error resulting in data
>  corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>  entire pool from backup.
> see: http://illumos.org/msg/ZFS-8000-8A
>scan: none requested
> config:
> 
>  NAME   STATE READ WRITE CKSUM
>  userdata   ONLINE   0 0  728K
>mirror-0 ONLINE   0 0 1,42M
>  gpt/userdata0  ONLINE   0 0 1,42M
>  gpt/userdata1  ONLINE   0 0 1,42M
> 
> errors: 4 data errors, use '-v' for a list
> [root@san1:~]# zpool status userdata
>pool: userdata
>   state: ONLINE
> status: One or more devices has experienced an error resulting in data
>  corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>  entire pool from backup.
> see: http://illumos.org/msg/ZFS-8000-8A
>scan: none requested
> config:
> 
>  NAME   STATE READ WRITE CKSUM
>  userdata   ONLINE   0 0  730K
>mirror-0 ONLINE   0 0 1,43M
>  gpt/userdata0  ONLINE   0 0 1,43M
>  gpt/userdata1  ONLINE   0 0 1,43M
> 
> errors: 4 data errors, use '-v' for a list
> 
> So, you see, the error rate is like speed of light. And I'm not sure if 
> the data access rate is that enormous, looks like they are increasing on 
> their own.
> So may be someone have an idea on what this really means.

Quoting a comment from 
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c:
/*
 * If destroy encounters an EIO while reading metadata (e.g. indirect
 * blocks), space referenced by the missing metadata can not be freed.
 * Normally this causes the background destroy to become "stalled", as
 * it is unable to make forward progress.  While in this stalled state,
 * all remaining space to free from the error-encountering filesystem is
 * "temporarily leaked".  Set this flag to cause it to ignore the EIO,
 * permanently leak the space from indirect blocks that can not be read,
 * and continue to free everything else that it can.
 *
 * The default, "stalling" behavior is useful if the storage partially
 * fails (i.e. some but not all i/os fail), and then later recovers.  In
 * this case, we will be able to continue pool operations while it is
 * partially failed, and when it recovers, we can continue to free the
 * space, with no leaks.  However, note that this case is actually
 * fairly rare.
 *
 * Typically pools either (a) fail completely (but perhaps temporarily,
 * e.g. a top-level vdev going offline), or (b) have localized,
 * permanent errors (e.g. disk returns the wrong data due to bit flip or
 * firmware bug).  In case (a), this setting does not matter because the
 * pool will be suspended and the sync thread will not be able to make
 * forward progress regardless.  In case (b), because the error is
 * permanent, the best we can do is leak the minimum amount of space,
 * which is what setting this flag will do.  Therefore, it is reasonable
 * for this flag to normally be set, but we chose the more conservative
 * approach of not setting it, so that there is no possibility of
 * leaking space in the "partial temporary" failure case.
 */

In FreeBSD the "flag" currently isn't easily reachable due to the lack
of a powerful kernel debugger (like mdb in Solaris offsprings) but
it can be made reachable with a sysctl using the patch from:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=218954

Fabian

pgpPIvlIrBIcn.pgp
Description: OpenPGP digital signature

Re: zpool imported twice with different names (was Re: Fwd: ZFS)

2017-05-16 Thread Fabian Keil

Nikos Vassiliadis  wrote:

> On 05/15/2017 08:09 PM, Nikos Vassiliadis wrote:
> > Hi everybody,
> > 
> > While trying to rename a zpool from zroot to vega,
> > I ended up in this strange situation:
> > nik@vega:~ % zfs list -t all
> > NAME USED  AVAIL  REFER  MOUNTPOINT
> > vega1.83G  34.7G96K  /zroot
> > vega/ROOT   1.24G  34.7G96K  none
> > vega/ROOT/default   1.24G  34.7G  1.24G  /
> > vega/tmp 120K  34.7G   120K  /tmp
> > vega/usr 608M  34.7G96K  /usr
> > vega/usr/home136K  34.7G   136K  /usr/home
> > vega/usr/ports96K  34.7G96K  /usr/ports
> > vega/usr/src 607M  34.7G   607M  /usr/src
> > vega/var 720K  34.7G96K  /var
> > vega/var/audit96K  34.7G96K  /var/audit
> > vega/var/crash96K  34.7G96K  /var/crash
> > vega/var/log 236K  34.7G   236K  /var/log
> > vega/var/mail100K  34.7G   100K  /var/mail
> > vega/var/tmp  96K  34.7G96K  /var/tmp
> > zroot   1.83G  34.7G96K  /zroot
> > zroot/ROOT  1.24G  34.7G96K  none
> > zroot/ROOT/default  1.24G  34.7G  1.24G  /
> > zroot/tmp120K  34.7G   120K  /tmp
> > zroot/usr608M  34.7G96K  /usr
> > zroot/usr/home   136K  34.7G   136K  /usr/home
> > zroot/usr/ports   96K  34.7G96K  /usr/ports
> > zroot/usr/src607M  34.7G   607M  /usr/src
> > zroot/var724K  34.7G96K  /var
> > zroot/var/audit   96K  34.7G96K  /var/audit
> > zroot/var/crash   96K  34.7G96K  /var/crash
> > zroot/var/log240K  34.7G   240K  /var/log
> > zroot/var/mail   100K  34.7G   100K  /var/mail
> > zroot/var/tmp 96K  34.7G96K  /var/tmp
> > nik@vega:~ % zpool status
> >pool: vega
> >   state: ONLINE
> >scan: scrub repaired 0 in 0h0m with 0 errors on Mon May 15 01:28:48
> > 2017 config:
> > 
> >  NAMESTATE READ WRITE CKSUM
> >  vegaONLINE   0 0 0
> >vtbd0p3   ONLINE   0 0 0
> > 
> > errors: No known data errors
> > 
> >pool: zroot
> >   state: ONLINE
> >scan: scrub repaired 0 in 0h0m with 0 errors on Mon May 15 01:28:48
> > 2017 config:
> > 
> >  NAMESTATE READ WRITE CKSUM
> >  zroot   ONLINE   0 0 0
> >vtbd0p3   ONLINE   0 0 0
> > 
> > errors: No known data errors
> > nik@vega:~ %
> > ---
> > 
> > It seems like there are two pools, sharing the same vdev...
> > 
> > After running a few commands in this state, like doing a scrub,
> > the pool was (most probably) destroyed. It couldn't boot anymore
> > and I didn't research further. Is this a known bug?
> > 
> > Steps to reproduce:
> >install FreeBSD-11.0 in a pool named zroot
> >reboot into a live-CD
> >zpool import -f zroot vega

Why did you use the -f flag? Unless you can reproduce the
problem without it, it's not obvious to me that this is a
bug.

Fabian


pgpkzqhGt3yYe.pgp
Description: OpenPGP digital signature

Re: moutnroot failing on zpools in Azure after upgrade from 10 to 11 due to lack of waiting for da0

2017-03-13 Thread Fabian Keil

Pete French  wrote:

> I have a number of machines in Azure, all booting from ZFS and, until
> the weekend, running 10.3 perfectly happily.
> 
> I started upgrading these to 11. The first went fine, the second would
> not boot. Looking at the boot diagnistics it is having problems finding
> the root pool to mount. I see this is the diagnostic output:
> 
>   storvsc0:  on vmbus0
>   Solaris: NOTICE: Cannot find the pool label for 'rpool'
>   Mounting from zfs:rpool/ROOT/default failed with error 5.
>   Root mount waiting for: storvsc
>   (probe0:blkvsc0:0:storvsc1: 0: Interface>0):  on vmbus0 storvsc scsi_status = 2
>   (da0:blkvsc0:0:0:0): UNMAPPED
>   (probe1:blkvsc1:0:1:0): storvsc scsi_status = 2
>   hvheartbeat0:  on vmbus0
>   da0 at blkvsc0 bus 0 scbus2 target 0 lun 0
> 
> As you can see, the drive da0 only appears after it has tried, and
> failed, to mount the root pool.
> 
> Normally I would just stick in a big 'vfs.mountroot.timeout' but that
> variable doesnt not appear to exist under 11 - or at least it doesnt
> show up in sysctl.

The variable still exists but is ignored when using ZFS.

It's a known issue. You could try this patch:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=208882#c3

Manually specifying the root pool should workaround the issue.

sysctl(8) does not show the variable as it's only a tunable.
This is unrelated to the update.

Fabian


pgpwEHDYyDr4C.pgp
Description: OpenPGP digital signature

Re: Swapping from a zvol results in a deadman panic

2017-02-05 Thread Fabian Keil

"Matthew X. Economou"  wrote:

> My FreeBSD 10.3-RELEASE-p16 server crashes in the middle of a Poudriere
> bulk run (see below).  This crash happens even if I lower
> vfs.zfs.arc_max or tweak vm.v_free_min/target/reserved/severe.  I'm
> looking for configuration advice in case I missed something obvious,
> since this seems to work on Illumos- and Linux-derived O/Ses, but
> failing that, I'd like to get some advice as to how to go about
> debugging this.  I doubt the deadman timer causes the system to stop
> responding.  It's more likely a race condition elsewhere.
> 
> The pool itself uses 4k sectors and is geli-encrypted.  I configured the
> swap zvol based on root-on-ZFS install instructions found in the FreeBSD
> wiki:

Paging on geli-encrypted devices is known to cause deadlocks
on FreeBSD, even if ZFS isn't involved directly:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209759

Adding ZFS to the mix is unlikely to help ...

> zfs create -V 6G -o org.freebsd:swap=on -o checksum=off -o
> compression=off -o dedup=off -o sync=disabled -o primarycache=none
> zroot/swap
> 
> The ZoL wiki recommends a slightly different zvol configuration:
> 
> zfs create -V 4G -b $(getconf PAGESIZE) -o logbias=throughput -o
> sync=always -o primarycache=metadata -o com.sun:auto-snapshot=false
> rpool/swap
> 
> I'm not sure how much of this applies to FreeBSD due to differences in
> kernel design/implementation.  Does anyone have an idea of what might be
> going on and how I might get this working?

You could try the patch from the PR and enable the
kern.geom.eli.use_uma_for_all_writes sysctl.

If you have a core dump, you may want to confirm that the
g_eli_worker is waiting for memory first.

Fabian


pgp_dI3TEXEPz.pgp
Description: OpenPGP digital signature

Poor ZFS ARC metadata hit/miss stats after recent ZFS updates

2016-10-17 Thread Fabian Keil

After rebasing some of my systems from r305866 to r307312
(plus local patches) I noticed that most of the ARC accesses
are counted as misses now.

Example:

[fk@elektrobier2 ~]$ uptime
 2:03PM  up 1 day, 18:36, 7 users, load averages: 0.29, 0.36, 0.30
[fk@elektrobier2 ~]$ zfs-stats -E


ZFS Subsystem ReportMon Oct 17 14:03:58 2016


ARC Efficiency: 3.38m
Cache Hit Ratio:12.87%  435.23k
Cache Miss Ratio:   87.13%  2.95m
Actual Hit Ratio:   9.55%   323.15k

Data Demand Efficiency: 6.61%   863.01k

CACHE HITS BY CACHE LIST:
  Most Recently Used:   18.97%  82.54k
  Most Frequently Used: 55.28%  240.60k
  Most Recently Used Ghost: 8.88%   38.63k
  Most Frequently Used Ghost:   24.84%  108.12k

CACHE HITS BY DATA TYPE:
  Demand Data:  13.10%  57.03k
  Prefetch Data:0.00%   0
  Demand Metadata:  32.94%  143.36k
  Prefetch Metadata:53.96%  234.85k

CACHE MISSES BY DATA TYPE:
  Demand Data:  27.35%  805.98k
  Prefetch Data:0.00%   0
  Demand Metadata:  71.21%  2.10m
  Prefetch Metadata:1.44%   42.48k



I suspect that this is caused by r307265 ("MFC r305323: MFV r302991:
6950 ARC should cache compressed data") which removed a
ARCSTAT_CONDSTAT() call but I haven't confirmed this yet.

The system performance doesn't actually seem to be negatively affected
and repeated metadata accesses that are counted as misses are still served
from memory. On my freshly booted laptop I get:

fk@t520 /usr/ports $for i in 1 2 3; do \
 /usr/local/etc/munin/plugins/zfs-absolute-arc-hits-and-misses; \
 time git status > /dev/null; \
 done; \
 /usr/local/etc/munin/plugins/zfs-absolute-arc-hits-and-misses;
zfs_arc_hits.value 5758
zfs_arc_misses.value 275416
zfs_arc_demand_metadata_hits.value 4331
zfs_arc_demand_metadata_misses.value 270252
zfs_arc_demand_data_hits.value 304
zfs_arc_demand_data_misses.value 3345
zfs_arc_prefetch_metadata_hits.value 1103
zfs_arc_prefetch_metadata_misses.value 1489
zfs_arc_prefetch_data_hits.value 20
zfs_arc_prefetch_data_misses.value 334

real1m23.398s
user0m0.974s
sys 0m12.273s
zfs_arc_hits.value 11346
zfs_arc_misses.value 389748
zfs_arc_demand_metadata_hits.value 7723
zfs_arc_demand_metadata_misses.value 381018
zfs_arc_demand_data_hits.value 400
zfs_arc_demand_data_misses.value 3412
zfs_arc_prefetch_metadata_hits.value 3202
zfs_arc_prefetch_metadata_misses.value 4885
zfs_arc_prefetch_data_hits.value 21
zfs_arc_prefetch_data_misses.value 437

real0m1.472s
user0m0.452s
sys 0m1.820s
zfs_arc_hits.value 11348
zfs_arc_misses.value 428536
zfs_arc_demand_metadata_hits.value 7723
zfs_arc_demand_metadata_misses.value 419782
zfs_arc_demand_data_hits.value 400
zfs_arc_demand_data_misses.value 3436
zfs_arc_prefetch_metadata_hits.value 3204
zfs_arc_prefetch_metadata_misses.value 4885
zfs_arc_prefetch_data_hits.value 21
zfs_arc_prefetch_data_misses.value 437

real0m1.537s
user0m0.461s
sys 0m1.860s
zfs_arc_hits.value 11352
zfs_arc_misses.value 467334
zfs_arc_demand_metadata_hits.value 7723
zfs_arc_demand_metadata_misses.value 458556
zfs_arc_demand_data_hits.value 400
zfs_arc_demand_data_misses.value 3460
zfs_arc_prefetch_metadata_hits.value 3208
zfs_arc_prefetch_metadata_misses.value 4885
zfs_arc_prefetch_data_hits.value 21
zfs_arc_prefetch_data_misses.value 437

Disabling ARC compression through vfs.zfs.compressed_arc_enabled
does not affect the accounting issue.

Can anybody reproduce this?

Fabian


pgpVFcIp4qm9F.pgp
Description: OpenPGP digital signature

Re: WLANDEV of vaps

2016-04-04 Thread Fabian Keil

Matthias Meyser  wrote:

> ist there a way to get the correspondig wlandev of an existing wlan?
> 
> e.g.
> 
> I have one urtwn0 an one run0 an one configured wlan0.
> 
> How do i know where wlan0 belongs to?

Try: sysctl net.wlan.0.%parent

Fabian


pgp5linSIydgJ.pgp
Description: OpenPGP digital signature

Re: Periodic jobs triggering panics in 10.1 and 10.2

2015-12-10 Thread Fabian Keil

Michelle Sullivan  wrote:

> ZFS has it's place, it is very good at some things, it brings features
> that people need.
> ZFS does not work (is not stable) on i386 without recompiling the
> kernel, but it is presented as an installation option.
> ZFS is compiled in by default in i386 kernels without the necessary
> option change to make it "stable".
> We have been told the kernel option change will never be put there by
> default.

FYI, the stack overflows should be addressed by:
https://svnweb.freebsd.org/base?view=revision=r286288

Fabian


pgpBJ6TnkYrx5.pgp
Description: OpenPGP digital signature

Re: 10.2-RELEASE-p2 lost ability to bootstrap pkg with signature_type="pubkey"

2015-09-08 Thread Fabian Keil

Marko Cupać  wrote:

> I just found out that 10.2-RELEASE-p2 lost ability to bootstrap pkg
> with signature_type="pubkey".
> 
> Quick search returns:
> https://github.com/freebsd/pkg/issues/1309
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=202622
> 
> I guess it is not hard to switch repo to fingerprints, however I would
> not expect to lose this functionality by updating to patchlevel.

The "functionality" pkg(7) "lost" is silently ignoring unsupported
signature types which is dangerous if the network can't be trusted:
https://www.freebsd.org/security/advisories/FreeBSD-EN-15:15.pkg.asc
https://www.fabiankeil.de/gehacktes/hardenedbsd/

If you absolutely want to, you can still bootstrap insecurely by
temporarily setting the signature type to none.

Fabian


pgpyIlNTJXyH2.pgp
Description: OpenPGP digital signature

Re: New FreeBSD snapshots available: stable/10 (20150625 r284813)

2015-07-01 Thread Fabian Keil

Chris Ross cross+free...@distal.com wrote:

   Yeah, this is the same panic you, I, and others have been seeing on
 sparc64's with bge's, or at least v240's (and one other IIRC) for many
 many months.  Thanks for grabbing a core!

Does it make a difference if you boot with hw.bge.allow_asf=0?

According to the man page it is known to cause system lockup problems
on a small number of systems. It's not obvious to me why it's enabled
by default on FreeBSD and I disable it on all my systems.

Fabian


pgp9Wpk3XRKvH.pgp
Description: OpenPGP digital signature

Re: New FreeBSD snapshots available: stable/10 (20150625 r284813)

2015-07-01 Thread Fabian Keil

Kurt Lidl l...@pix.net wrote:

  [-stable@ in CC since these are the first 10.2-PRERELEASE builds
  available since the code slush went into effect, which marks the start
  of the release cycle.]
 
  New FreeBSD development branch installation ISOs and virtual machine
  disk images have been uploaded to the FTP mirrors.
 
  As with any development branch, the installation snapshots are not
  intended for use on production systems.  We do, however, encourage
  testing on non-production systems as much as possible.
 
 I was able to download the sparc64 iso image, burn the iso to a
 cd-rom, and boot a sparc64 V120 from that image.
 
 I was also able to perform an install onto a ZFS only setup,
 and have it work properly.

On i386, the ZFS-only installation reproducible works after the
first reboot but after the first reboot panics while importing
the root pool.

The problem seems to be that the GENERIC kernel is build with
clang but KSTACK_PAGES has not been adjusted according to UPDATING:

| 20121223:
|After switching to Clang as the default compiler some users of ZFS
|on i386 systems started to experience stack overflow kernel panics.
|Please consider using 'options KSTACK_PAGES=4' in such configurations.

If the issue can't be addressed before the release it may be
worth mentioning it in the release notes.

Fabian


pgpC7ZdQNGlTL.pgp
Description: OpenPGP digital signature

Re: New FreeBSD snapshots available: stable/10 (20150625 r284813)

2015-06-27 Thread Fabian Keil

Glen Barber g...@freebsd.org wrote:

 [-stable@ in CC since these are the first 10.2-PRERELEASE builds
 available since the code slush went into effect, which marks the start
 of the release cycle.]
 
 New FreeBSD development branch installation ISOs and virtual machine
 disk images have been uploaded to the FTP mirrors.
 
 As with any development branch, the installation snapshots are not
 intended for use on production systems.  We do, however, encourage
 testing on non-production systems as much as possible.

ggatec and ggatel are still broken on i386:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197309
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199559

If the ZFS root pools isn't found right away, the system deadlocks:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=198563

Patches are available so it would be great if these issues
could be fixed before the release.

Fabian


pgpQ22L1TKtmV.pgp
Description: OpenPGP digital signature

Re: patch which implements ZFS LZ4 compression

2013-02-10 Thread Fabian Keil

Jeremy Chadwick j...@koitsu.org wrote:

 On Sat, Feb 09, 2013 at 03:19:18PM +0100, Fabian Keil wrote:
  Jeremy Chadwick j...@koitsu.org wrote:
 
   If you want a PR for it, I'll file one, but all it's going to contain is
   the contents of this Email.
  
  My impression is that your emails describe symptoms and contain
  some speculation about what the cause might be. I didn't see any
  sched traces, so it's unclear (to me) that priorities are actual
  the problem.
 
 They contain no speculation.
 
 Bob Friesenhahn, who has a lot of experience and familiarity with ZFS on
 Solaris, seemed to know exactly the behaviour I described.  Others on
 FreeBSD have reported the same behaviour as well, just not in that
 thread circa 2011.

Similar symptoms can have different causes.
 
 Regarding sched traces, please expand and include instructions.

I'm referring to the stuff that is fed into:
/usr/src/tools/sched/schedgraph.py

It can be created with ktrace and dtrace and I believe the
documentation is buried in the various the scheduler sucks
threads.

  It's also unclear to me why the dedup and compression issues should
  be related. There are lots of dedup performance issues reported for
  Solaris as well and I doubt that they can be fixed for FreeBSD without
  significantly deviating from the ZFS upstream.
 
 What part of Bob's statement did you not understand?  Here, let me
 repeat it verbatim:
 
 Solaris solved the problem by putting the zfs writer threads into a 
 special scheduling class so that they are usually lower priority than 
 normal processing.  Before this change, a desktop system would become 
 almost unusable (intermittent loss of keyboard/mouse) while writing 
 lots of data with compression enabled.  Some NFS servers encountered 
 severe enough issues that NFS clients reported NFS timeouts.

My impression from reading zfs-discuss@ is that dedup performance
and some interactivity issues actually still exist in Illumos and
that they are completely unrelated to zfs writer threads.

As I can't use dedup on my systems I don't really pay attention to
them, though.

  I'm not saying a PR would be useless, but in my experience PRs
  with insufficient information just stay open and if the problem
  isn't important enough for you to provide additional information
  filing a PR is unlikely to have a great impact:
  http://www.freebsd.org/cgi/query-pr-summary.cgi?category=text=zfs
 
 Then someone in the know needs to explain exactly *what* data would help
 and (more importantly) *how* to go about providing it (i.e. what to
 enable in the kernel, what commands to issue, etc.).  Eidan has
 repeatedly insisted that PRs are a Good Thing(tm) because they allow for
 an official way to track issues vs. mailing list threads that start and
 turn into tumbleweeds (just like the one I've referenced).

And how many of those PRs are actually solved?

This is a rhetoric question and I don't expect you to look it up.

I'm not saying that PRs are a bad thing, but filing PRs is the easy part
and in my experience issues that don't spark developer interest on the
mailing lists are usually also ignored when filed as PR, especially when
they don't contain 100% of the information that may be relevant.

Even if you provide proof that the priorities are indeed the cause
of the problem there's a fair chance that the PR gets ignored anyway.

I currently have four somewhat ZFS-related PRs open, the first was filed
in 2007.

I still don't think that the solution is that nobody works on ZFS
improvements until my PRs are solved.

I'm looking forward to using LZ4 which promises better compression than
lzjb with less interactivity impact than gzip. It might even work for
your /dev/random test as it's supposed to better deal with poorly
compressible data.

 Without those necessary instructions, in effect what you're asking me to
 do is prove that the problem exists, which I have already done so.
 You just don't like the data I've provided.

I don't expect you to prove that the problem exists.

My impression is that the interactivity issues with gzip have
been well known for years and exist since the ZFS import.

I also don't dislike your data, all I'm saying is that
there could be other explanations.

 Bottom line: people enable compression on an fs, issue large amounts of
 write I/O to that fs (say hundreds of megabytes, or gigabytes), and
 start to see the entire system intermittently stalling hard (for
 multiple seconds at a time).  This affects everything from switching VTs
 on physical console to packets going across SSH.  The stalls vary in
 duration depending on what compression type is used (lzjb vs. gzip-1 --
 I cannot even imagine what gzip-9 would be like).  I described it as
 verbosely as I could, including going back and re-testing because
 people felt the ZFSv28 import might have addressed it (it did not):
 
 http://lists.freebsd.org/pipermail/freebsd-fs/2011-October/012752.html

I'm aware that the interactivity issues

Re: patch which implements ZFS LZ4 compression

2013-02-09 Thread Fabian Keil

Jeremy Chadwick j...@koitsu.org wrote:

 On Fri, Feb 08, 2013 at 02:52:57PM -0800, Xin Li wrote:
  On 02/08/13 14:29, Dan Langille wrote:
   Here is a patch against FreeBSD 9.1 STABLE which implements ZFS LZ4
   compression.
   
   https://plus.google.com/106386350930626759085/posts/PLbkNfndPiM
   
   short link: http://bpaste.net/show/76095
  
  Please DO NOT use this patch!  It will ruin your data silently.
  
  As I already posted on Ivan's Google+ post, I'm doing final universe
  builds to make sure that there is no regression and will merge my
  changes to -HEAD later today.
 
 Another compression algorithm, this time 50%+ faster than lzjb.  Great,
 fine, wonderful, awesome, kudos, huzzah, blah blah blah.
 
 So when is someone going to step up to the plate and fix how compression
 (as well as dedup) destroys interactivity on FreeBSD?  Do I need to
 remind folks of this issue once again?  Here you have it, dated October
 2011, including the root cause and how it was fixed in Solaris et al:
 
 Description:
 
 http://lists.freebsd.org/pipermail/freebsd-fs/2011-October/012718.html
 
 Explanation and how Solaris et al fixed it, and how on Solaris the
 problem was major enough that it even caused NFS timeouts (sound
 familiar to anyone?):
 
 http://lists.freebsd.org/pipermail/freebsd-fs/2011-October/012726.html
 
 Further testing showing gzip-1 vs. lzjb and interactivity stalls:
 
 http://lists.freebsd.org/pipermail/freebsd-fs/2011-October/012752.html
 
 This is still a problem with base/stable/9.  And as I have said
 elsewhere on lists, do not ask me to run CURRENT -- it will be a cold
 day in hell before I ever do that.  I assume this same problem exists in
 CURRENT unless I have some key developer/committer say I backported
 this fix in CURRENT, absolutely 100% sure.
 
 I'm also wondering why iXSystems hasn't stepped up to the plate to
 contribute to making this happen, given their business focus.  I do not
 have the knowledge of the kernel (or of threading) to fix this myself,
 and for that I do apologise.
 
 But every time I see compression or dedup mentioned, I use the
 opportunity to bring up this subject.  STOP ADDING FEATURES AND FIX
 STUFF LIKE THIS INSTEAD -- while new algorithms are neat/fun toys, they
 do not truly fix issues like this.  How this problem has continually
 gotten overlooked is beyond me.

Did you consider that other people may have different priorities
than you do?

 If you want a PR for it, I'll file one, but all it's going to contain is
 the contents of this Email.

My impression is that your emails describe symptoms and contain
some speculation about what the cause might be. I didn't see any
sched traces, so it's unclear (to me) that priorities are actual
the problem.

It's also unclear to me why the dedup and compression issues should
be related. There are lots of dedup performance issues reported for
Solaris as well and I doubt that they can be fixed for FreeBSD without
significantly deviating from the ZFS upstream.

I'm not saying a PR would be useless, but in my experience PRs
with insufficient information just stay open and if the problem
isn't important enough for you to provide additional information
filing a PR is unlikely to have a great impact:
http://www.freebsd.org/cgi/query-pr-summary.cgi?category=text=zfs

Fabian


signature.asc
Description: PGP signature

Re: how to destroy zfs parent filesystem without destroying children - corrupted file causing kernel panick

2012-12-31 Thread Fabian Keil

Greg Bonett greg.bon...@gmail.com wrote:

  My next plan would be reporting the problem with sufficient
  information so the bug can be fixed.
 
  Destroying the dataset or the whole pool seems like papering over the
  real issue to me and you could still do it if the PR gets ignored for
  too long or a developer agrees that this is the only option.
 
 
 ok, that's a good idea - do you know where I should report this problem?

I'd start with freebsd-fs@ and file a proper PR if there's
still no response after a few weeks or so.

If you haven't already, you might want to skim through:
http://www.freebsd.org/cgi/query-pr-summary.cgi?text=zfs
first, to see if your problem is already known.

 unfortunately, I don't know how I can provide the problematic file because
 any read, cp, or mv causes kernel panic.

Additional information about the panic itself will probably do at
the beginning, you can always provide more later if someone asks for
it.

For details see:
http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug.html

Fabian


signature.asc
Description: PGP signature

Re: how to destroy zfs parent filesystem without destroying children - corrupted file causing kernel panick

2012-12-30 Thread Fabian Keil

Greg Bonett greg.bon...@gmail.com wrote:

 Many months ago, I believe some *very bad hardware* caused corruption of a
 file on one of my zfs file systems.  I've isolated the corrupted file and
 can reliably induce a kernel panic with touch bad.file, rm bad.file, or
 ls -l in the bad.file's directory (ls in bad.file's dir doesn't cause
 panic, but ls bad.file does).
 
 This is a raidz zpool, but zpool scrub doesn't fix it - it eventually
 creates a kernel panic.
 
 My next plan is to attempt to get rid of this file by zfs destroy(ing) the
 entire filesystem. The corrupted file is on /tank, and I've copied all of
 the good data onto a new zfs file system, /tank/tempfs/.

My next plan would be reporting the problem with sufficient
information so the bug can be fixed.

Destroying the dataset or the whole pool seems like papering over the
real issue to me and you could still do it if the PR gets ignored for
too long or a developer agrees that this is the only option.

Fabian


signature.asc
Description: PGP signature

Re: geom using 100% cpu with failed da5. How to calm it down without cam passdev?

2012-12-04 Thread Fabian Keil

Harald Schmalzbauer h.schmalzba...@omnilan.de wrote:

 I've a failed disk at a remote server, which shouldn't be a problem
 actually.

Welcome to geom ...

 Just for info, here's the last shout:
 kernel: (da5:mps0:0:5:0): SYNCHRONIZE CACHE(10). CDB: 35 0 0 0 0 0 0
 0 0 0 length 0 SMID 256 command timeout cm 0xff8001c64800 ccb
 0xfe0007329000
 kernel: mps0: mpssas_alloc_tm freezing simq
 kernel: mps0: timedout cm 0xff8001c64800 allocated tm
 0xff8001c50148
 kernel: (da5:mps0:0:5:0): SYNCHRONIZE CACHE(10). CDB: 35 0 0 0 0 0 0
 0 0 0 length 0 SMID 256 completed timedout cm 0xff8001c64800 ccb
 0xfe0007329000 during recovery ioc 8048 scsi 0 state c
 xf(noperiph:mps0:0:5:0): SMID 1 abort TaskMID 256 status 0x4a code 0x0
 count 1
 kernel: (noperiph:mps0:0:5:0): SMID 1 finished recovery after
 aborting TaskMID 256
 kernel: mps0: mpssas_free_tm releasing simq
 kernel: (da5:mps0:0:5:0): SYNCHRONIZE CACHE(10). CDB: 35 0 0 0 0 0 0
 0 0 0
 kernel: (da5:mps0:0:5:0): CAM status: Command timeout
 kernel: (da5:mps0:0:5:0): Retrying command
 kernel: (da5:mps0:0:5:0): TEST UNIT READY. CDB: 0 0 0 0 0 0 length 0
 SMID 981 terminated ioc 804b scsi 0 state 0 xfer 0
 kernel: mps0: mpssas_alloc_tm freezing simq
 kernel: mps0: mpssas_remove_complete on handle 0x000e, IOCStatus= 0x0
 kernel: mps0: mpssas_free_tm releasing simq
 kernel: (da5:mps0:0:(pass7:5:mps0:0:0): lost device - 4 outstanding,
 2 refs
 kernel: 5:0): passdevgonecb: devfs entry is gone
 kernel: (da5:mps0:0:5:0): oustanding 3
 kernel: (da5:mps0:0:5:0): oustanding 2
 kernel: (da5:mps0:0:5:0): oustanding 1
 kernel: (da5:mps0:0:5:0): oustanding 0
 
 After reboot, 'camcontrol devlist' doesn't show any da5,
 but 'geom disk list' _does_ show da5!!!
 
 My problem is that geom is now consuming 100% of one core!
 top -S:
 13 root3  -8- 0K48K -   1 480:19 100.00% geom
 
 Since there's no /dev/da5 I can't use camcontrol to stop anything, and
 at the moment nobody can physically remove the failed drive.
 How can I calm geom down?

I reported a similar problem in:
http://www.freebsd.org/cgi/query-pr.cgi?pr=171865

The PR contains a patch that I'm using as a workaround.

 How can I find out what geom is doing/trying to do?
 I guess it's related to the failed da5, but how can I know?

DTrace might help.

Fabian


signature.asc
Description: PGP signature

Re: geli decrypt only one partition

2012-07-01 Thread Fabian Keil

joerg_surmann joerg_surm...@snafu.de wrote:

 Sorry, i no had enough time for this geli problem.
 I work with a testsystem.
 When start booting in verbose mode the system found the keypaths.
 
 Preloaded ada0p4:geli_keyfile0 /root/keys/ada0p4.key at 0xc14bf540.
 Preloaded ada1p4:geli_keyfile1 /root/keys/ada1p4.key at 0xc14bf598.
 
 loader.conf
 geom_eli_load=YES
 
 geli_ada0p4_keyfile0_load=YES
 geli_ada0p4_keyfile0_type=ada0p4:geli_keyfile0
 geli_ada0p4_keyfile0_name=/root/keys/ada0p4.key
 
 geli_ada1p4_keyfile1_load=YES
 geli_ada1p4_keyfile1_type=ada1p4:geli_keyfile1
 geli_ada1p4_keyfile1_name=/root/keys/ada1p4.key
 
 zfs_load=YES
 vfs.root.mountfrom=zfs:zroot
 
 on boottime i can decrypt ada0p4.
 for ada1p4 ... wrong key.
 
 i can decrypt ada1p4 later by hand with the keyfile like loader.conf.
 same situation.
 ada0p4 and ada1p4 are a zfs mirror.

Like I already wrote before, the problem is most like that you named
the first keyfile for the second provider keyfile1 instead of keyfile0.

The keyfile numeration restarts for each provider and geli
will not use keyfile1 if keyfile0 doesn't exist.

I missed that the Preloaded ... messages are a bit misleading
here as they only show that the loader lines are recognized and
that the kernel read the files, not that geli does anything useful
with them.

If you increase kern.geom.eli.debug you'll probably see that
/root/keys/ada0p4.key is used by geli while /root/keys/ada1p4.key
isn't.

Fabian


signature.asc
Description: PGP signature

Re: geli decrypt only one partition

2012-06-20 Thread Fabian Keil

joerg_surmann joerg_surm...@snafu.de wrote:

 i have two partitions:   ada0p3.eli and ada1p3.eli
 on bootprocess i must type a passphrase for ada0p3 and  have ada0p3.eli.
 next i type the passphrase for ada1p3 and i become: wrong key
 when the bootprocess is finish and i login and type geli attach -k
 /path to keyfile /dev/ada1p3 and i type the passphrase then i have
 ada1p3.eli.
 why can i decrypt only one partition on bootprocess?

This is frequently the effect of an incorrectly specified
keyfile in loader.conf. Do you get a boot message like the
following for both keyfiles when booting in verbose mode?

Jun 20 19:49:34 r500 kernel: Preloaded ada0s1d:geli_keyfile0 /boot/ad4s1d.key 
at 0x813951d0.

Fabian


signature.asc
Description: PGP signature

Re: kern/157863: [geli] kbdmux prevents geli passwords from being entered properly on boot

2012-06-13 Thread Fabian Keil

Thomas Steen Rasmussen tho...@gibfest.dk wrote:

 Just to let everyone know that this is still an issue.
 
 I am trying to install FreeBSD 9.0 amd64 on a Lenovo X121e and I
 can't get it to accept the geli passphrase during boot. I've confirmed
 using kern.geom.eli.visible_passphrase=1 that the passphrase is
 correct, and the same passphrase is accepted when the system is
 booted up.
 
 I've tried disabling kbdmux in /boot/device.hints like the PR said,
 but that didn't help. I also tried disabling atkbd and atkbdc without
 any luck, infact I couldn't type anything at all when disabling those.

If disabling kbdmux doesn't help, it sounds like a different issue to me.

 Any hints or suggestions to what I might try ? I have another 9-stable
 laptop that mounts a geli volume at boot, no idea why that one works
 and this new one doesn't.

Are you using the password together with a keyfile?

I've misconfigured the keyfile in loader.conf in the past,
which results in the valid password not being accepted.

Obviously the setup then magically works later on when the
keyfile is specified correctly on the command line.

If you aren't using keyfiles, you could try setting up an USB
stick with geli, to confirm that the same media works on one
laptop, but doesn't on the other.

Fabian


signature.asc
Description: PGP signature

Re: FreeBSD root on a geli-encrypted ZFS pool

2012-03-18 Thread Fabian Keil

Matthew X. Economou xenop...@irtnog.org wrote:

 Fabian Keil writes:

  Anyway, it's a test without file system so the ZFS overhead isn't
  measured. I wasn't entirely clear about it, but my assumption was
  that the ZFS overhead might be big enough to make the difference
  between HMAC/MD5 and HMAC/SHA256 a lot less significant.
 
 Got it.  That also makes sense.  I'll put this on my to-test list. 

Great.
 
  I'm currently using sector sizes between 512 and 8192 so I'm not
  actually expecting technical problems, it's just not clear to me
  how much the sector size matters and if 4096 is actually the best
  value when using ZFS.
 
 The geli(8) manual page claims that larger sector sizes lower the
 overhead of GEOM_ELI keying initialization and encryption/decryption
 steps by requiring fewer of these compute-intensive setup operations
 per block.

I think the setup operations per block should stay the same,
but the total number of setup operations decrease if(f) increasing
the sector size decreases the number of sectors required to write
the data.

That however should depend on the data and I don't see why
increasing the sector size should always be an improvement.

Geli can't read or write less than a sector, so if the workload
is randomly reading or writing a few hundred bytes, a sector
size of 512 bytes should be superior to a sector size of 4 kB.

Probably a sector size of 4 kB is good for some workloads,
but clearly it can't be the best for all, and it's not obvious
to me that it's the best for most.

Fabian


signature.asc
Description: PGP signature

Re: FreeBSD root on a geli-encrypted ZFS pool

2012-03-09 Thread Fabian Keil

xenophon\\+freebsd xenophon+free...@irtnog.org wrote:

  -Original Message-
  From: Fabian Keil [mailto:freebsd-lis...@fabiankeil.de]
  Sent: Wednesday, March 07, 2012 11:49 AM

  It's not clear to me why you enable geli integrity verification.
  
  Given that it is single-sector-based it seems inferior to ZFS's
  integrity checks in every way and could actually prevent ZFS from
  properly detecting (and depending on the pool layout correcting)
  checksum errors itself.
 
 My goal in encrypting/authenticating the storage media is to prevent
 unauthorized external data access or tampering.  My assumption is that
 ZFS's integrity checks have more to do with maintaining metadata
 integrity in the event of certain hardware or software faults (e.g.,
 operating system crashes, power outages) - that is to say, ZFS cannot
 tell if an attacker boots from a live CD, imports the zpool, fiddles
 with something, and reboots, whereas GEOM_ELI can if integrity checking
 is enabled (even if someone tampers with the encrypted data).

If the ZFS pool is located on GEOM_ELI providers the attacker
shouldn't be able to import it unless the passphrase and/or
keyfile are already known.

If the attacker tampers with the encrypted data used by the pool,
ZFS should detect it, unless it's a replay attack in which case
enabling GEOM_ELI's integrity checking wouldn't have helped you
either.

If the attacker only replays a couple of blocks, ZFS's integrity
detection is likely to detect it for most blocks, while GEOM_ELI's
integrity checking will not detect it for any block.

In my opinion protecting ZFS's default checksums (which cover
non-metadata as well) with GEOM_ELI is sufficient. I don't see
what advantage additionally enabling GEOM_ELI's integrity
verification offers.

This does
 raise an interesting question that merits further testing: What happens
 if a physical sector goes bad, whether that's due to a system bus or
 controller I/O error, a physical problem with the media itself, or
 someone actively tampering with the encrypted storage?  GEOM_ELI would
 probably return some error back to ZFS for that sector, which could
 cause the entire vdev to go offline but might just require scrubbing the
 zpool to fix.
 
  I'm also wondering if you actually benchmarked the difference
  between HMAC/MD5 and HMAC/SHA256. Unless the difference can
  be easily measured, I'd probably stick with the recommendation.
 
 I based my choice of HMAC algorithm on the following forum post:
 
 http://forums.freebsd.org/showthread.php?t=12955

I'm wondering if dd's block size is correct, 4096 seems rather small.

Anyway, it's a test without file system so the ZFS overhead isn't
measured. I wasn't entirely clear about it, but my assumption was
that the ZFS overhead might be big enough to make the difference
between HMAC/MD5 and HMAC/SHA256 a lot less significant.

 I wouldn't recommend anyone use MD5 in real-world applications, either,
 so I'll update my instructions to use HMAC/SHA256 as recommended by
 geli(8).

It's still not clear to me why you recommend using a HMAC for geli at all.

  I would also be interested in benchmarks that show that geli(8)'s
  recommendation to increase geli's block size to 4096 bytes makes
  sense for ZFS. Is anyone aware of any?
 
 As far as I know, ZFS on FreeBSD has no issues with 4k-sector drives,
 see Ivan Voras' comments here:
 
 http://ivoras.net/blog/tree/2011-01-01.freebsd-on-4k-sector-drives.html

 Double-checking my zpool shows the correct value for ashift:
 
   masip205bsdfile# zdb -C tank | grep ashift
   ashift: 12

I'm currently using sector sizes between 512 and 8192 so I'm not
actually expecting technical problems, it's just not clear to me
how much the sector size matters and if 4096 is actually the best
value when using ZFS.

 Benchmarking different geli sector sizes would also be interesting and
 worth incorporating into these instructions.  I'll add that to my to-do
 list as well.

Great.

Fabian


signature.asc
Description: PGP signature

Re: FreeBSD root on a geli-encrypted ZFS pool

2012-03-07 Thread Fabian Keil

xenophon\\+freebsd xenophon+free...@irtnog.org wrote:

 I have posted revised instructions for installing FreeBSD to an
 encrypted ZFS pool on my blog:
 
 https://web.irtnog.org/~xenophon/blog/revised-freebsd-root-zfs-geli
 
 The entire procedure is documented in a way suitable for scripting.  I
 would be very interested in the community's feedback.

It's not clear to me why you enable geli integrity verification.

Given that it is single-sector-based it seems inferior to ZFS's
integrity checks in every way and could actually prevent ZFS from
properly detecting (and depending on the pool layout correcting)
checksum errors itself.

I'm also wondering if you actually benchmarked the difference
between HMAC/MD5 and HMAC/SHA256. Unless the difference can
be easily measured, I'd probably stick with the recommendation.

I would also be interested in benchmarks that show that geli(8)'s
recommendation to increase geli's block size to 4096 bytes makes
sense for ZFS. Is anyone aware of any?

Fabian


signature.asc
Description: PGP signature

Re: sysutils/pftop on 9.x+

2012-02-14 Thread Fabian Keil

Greg Rivers gcr+freebsd-sta...@tharned.org wrote:

 sysutils/pftop was marked broken on 9.x and above last March[1].  Are 
 there any plans to fix it soon?  It's a really handy utility.
 
 [1] 
 http://www.freebsd.org/cgi/cvsweb.cgi/ports/sysutils/pftop/Makefile?rev=1.17

Please have a look at:
http://www.freebsd.org/cgi/query-pr.cgi?pr=155938

Note that the currently working fix is in the audit trail,
the original fix stopped working after the PF update.

Fabian


signature.asc
Description: PGP signature

Re: Setting coredumpsize on a running process?

2011-10-18 Thread Fabian Keil

Ivan Voras ivo...@freebsd.org wrote:

 On 18 October 2011 16:43, Jeremy Chadwick free...@jdc.parodius.com wrote:
  On Tue, Oct 18, 2011 at 04:32:11PM +0200, Ivan Voras wrote:
  I have PHP executing as fastcgi via the mod_fcgid module in Apache. I
  suspect there is a bug in PHP or one of its extensions which causes it
  to crash with sigsegv, but I cannot get any coredumps. I suspect
  something is setting coredumpsize to 0 - either Apache, mod_fcgid or PHP.
 
  So the question is: is there a way to set coredumpsize on a running
  process, with the intention of getting a core dump when it crashes? I
  already tried setting CoreDumpDirectory in Apache and also configuring
  apache22limits_args in /etc/rc.conf but without effect.
 
  I ended up solving this on a machine where coredumps with Apache + PHP
  were highly common by setting sysctl kern.corefile to
  /var/cores/%P.%N.core, then made sure the /var/cores directory was
  root:wheel, perms 1777.  Otherwise I could not get a coredump.
  apache22limits_enable did not help either, nor did CoreDumpDirectory.
 
  Having fun yet?
 
 Oh, I have years and years of fun debugging PHP, in one way or the other :)
 
 Your suggestion for setting core dump directory explicitely helped;
 now it looks like I've hit an infinite recursion / stack eating bug
 somewhere in PCRE...
 
 #1703 0x000805d5c72e in match () from /usr/local/lib/libpcre.so.0
 #1704 0x000805d5b4f0 in match () from /usr/local/lib/libpcre.so.0
 #1705 0x000805d5c72e in match () from /usr/local/lib/libpcre.so.0
 #1706 0x000805d5b4f0 in match () from /usr/local/lib/libpcre.so.0
 
 However, I'm drawing the line at debugging PCRE, this will go into the
 don't do that category.

There's a fair chance that this isn't a bug in pcre,
but the result of a poorly written expression.

You may want to have a look at pcrestack(3).

Fabian


signature.asc
Description: PGP signature

Re: geli problems after installkernel installworld

2011-01-16 Thread Fabian Keil

Christopher J. Ruwe c...@cruwe.de wrote:

 On Sat, 15 Jan 2011 22:30:56 +0100
 Pawel Jakub Dawidek p...@freebsd.org wrote:
 
  On Thu, Jan 13, 2011 at 10:00:19PM +0100, Christopher J. Ruwe wrote:
   I use a mostly geli encrypted hd on my Thinkpad R500,
   with /compat, /usr, /tmp and /var all on the encrypted geli
   provider.
   
   After an upgrade of kernel and world (STABLE), I experience a weird
   issue: While booting, I am asked for the geli passphrase as usual.
   Completing password authentication for geli returns a success
   message,
   
   cryptosoft0: software crypto on motherboard
   GEOM_ELI: Device ada0p3.eli created.
   GEOM_ELI: Encryption: AES-CBC 256
   GEOM_ELI: Crypto: software
   
   however, the zpool on geli is unavailable.
   
   Logging in a root, I can attach the geli provider manually as geli
   itself should do from /etc/rc.conf. After a successful zfs mount
   -a, I can resume as usual after manually starting
   the /usr/local/rc.d services. 
   
   Neither have I noticed a change in the device names nor any unusual
   messages from dmesg. Currently, I am doing a new compile run on
   world and kernel to attempt anew tomorrow.
   
   Am I missing something?
  
  Can you show the output of 'geli list' from a running system?
  
 
 Sure I can ... I'll additionally  comment the output with what I do to.
 
 First I boot and my /usr/local/rc.d/ - schripts do not start. Likewise
 does zsh.
 
 From doing geli list, I get (on stdout)
 
 Geom name: ada0p3.eli
 State: ACTIVE
 EncryptionAlgorithm: AES-CBC
 KeyLength: 256
 Crypto: software
 UsedKey: 0
 Flags: SINGLE-KEY, NATIVE-BYTE-ORDER, BOOT, RW-DETACH
 Providers:
 1. Name: ada0p3.eli
Mediasize: 249656594432 (233G)
Sectorsize: 4096
Mode: r0w0e0
 Consumers:
 1. Name: ada0p3
Mediasize: 249656596992 (233G)
Sectorsize: 512
Mode: r1w1e1
 
 Doing a zpool status -v gives on stdout
 
  pool: ntank
  state: UNAVAIL
 status: One or more devices could not be opened.  There are insufficient
 replicas for the pool to continue functioning.
 action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-3C
  scrub: none requested
 config:
 
 NAME  STATE READ WRITE CKSUM
 ntank UNAVAIL  0 0 0  insufficient replicas
   ada0p3.eli  UNAVAIL  0 0 0  cannot open
 
   pool: rpool
  state: ONLINE
 status: The pool is formatted using an older on-disk format.  The pool
   can still be used, but some features are unavailable.
 action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
 pool will no longer be accessible on older software versions.
  scrub: none requested
 config:
 
 NAME  STATE READ
 WRITE CKSUM rpool
 ONLINE   0 0 0
 gptid/3ab00705-d22f-11df-8e1b-002713b40a7b  ONLINE   0
 0 0
 
 errors: No known data errors
 
 and on stderr ( I noticed the output on stderr as I ran the command, so
 I just typed that)
 
 GEOM_ELI[1]: Device ada0p3.eli is still open, so it cannot be definitely
 removed.
 GEOM_ELI[1]: Detached ada0p3.eli on last close.
 
 When doing a geli attach -k /pathtomykey/key /dev/ada0p3 directly
 followed by a zfs mount -a, I have my filesystems where I am used to
 finding them. I run my /usr/local/rc.ds from there and am functional
 again.
 
 Then (I post this anwe, I will point out why later on), I get for geli
 list
 
 Geom name: ada0p3.eli
 State: ACTIVE
 EncryptionAlgorithm: AES-CBC
 KeyLength: 256
 Crypto: software
 UsedKey: 0
 Flags: SINGLE-KEY, NATIVE-BYTE-ORDER, BOOT
 Providers:
 1. Name: ada0p3.eli
Mediasize: 249656594432 (233G)
Sectorsize: 4096
Mode: r1w1e1
 Consumers:
 1. Name: ada0p3
Mediasize: 249656596992 (233G)
Sectorsize: 512
Mode: r1w1e1
 
 I never noticed that before, but, as I did not know which geli output
 you were asking for (the one not working or the one working), I diffed
 the two files and noticed, that directly  after booting, the RW-DETACH
 flag is set. I do not know what that means nor do I know whether that
 matters, I find that curious, though.

I'm no sure if it's the cause of your problem,
but it certainly does matter:
http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/117158

Fabian


signature.asc
Description: PGP signature

Re: ATA_CAM + ZFS gives short 1-2 seconds system freeze on disk load

2010-02-08 Thread Fabian Keil

Jeremy Chadwick free...@jdc.parodius.com wrote:

 On Mon, Feb 08, 2010 at 03:33:29PM +0100, Guido Falsi wrote:

  I'm seeing this problem on my machine at work. It's an HP DC 7800,
  mounts an ich9 chipset(not ahci capable). I'm attaching the dmesg.
  
  I noticed this in the past, but it got evident(and very annoying)
  while recompiling many ports today after the jpeg-8 update.
  
  It looks like it freezes the system for the second or two it takes
  to flush buffers to disk when there are big outputs. This happens
  when decompressiong big distfiles, mainly. The openoffice port
  triggers this almost continuosly every few seconds during compilation.
  I've also seen this when working with big files(for example graphic
  images in uncompressed formats).
  
  It gets very annoying and I don't remember this happening before
  activating the ATA_CAM flag. There was some slowdown with big disk
  access, but not a total freeze.
 
 This happens without ATA_CAM (e.g. using ataahci(4) or any other
 controller driver).

Indeed.

 The behaviour you're describing (bursty heavy disk I/O that stalls the
 subsystem) is pretty much the norm on all FreeBSD systems I've seen with
 ZFS.  When it starts happening, it's easy to notice/follow using zpool
 iostat 1 or gstat -I500ms.  Lots of I/O will happen (read or write)
 and the ARC is essentially being thrashed -- said utilities won't show
 any I/O counters incrementing until some threshold is reached, where
 you'll see a massive amount of I/O reported, during which time the
 system is sluggish (beyond acceptable levels, IMHO).  A few seconds
 later, the I/O counters start reporting 0 as the ARC gets used, then
 a few seconds massive I/O, rinse lather repeat.

I experienced what I think is the same problem. ZFS's bulk disk flushes
caused vlc to occasionally stutter when viewing a DVD rip from disk while
ripping a DVD at the same time.

My workaround is to put vfs.zfs.txg.timeout=3 in /boot/loader.conf.
I think I read about this on zfs-disc...@. I assume on faster systems
one can use a higher value.

I'm currently updating the jpeg dependencies, too:

f...@r500 ~ $zpool iostat 1
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
tank 176G  52.1G 22 40  1.40M  1.85M
tank 176G  52.1G 73  0  9.24M  0
tank 176G  52.1G 73  0  9.05M  0
tank 176G  52.1G 42176  5.12M  11.3M
tank 176G  52.1G 68  0  8.62M  0
tank 176G  52.1G 67  0  8.43M  0
tank 176G  52.1G 57106  7.11M  9.54M
tank 176G  52.1G 75  0  9.50M  0
tank 176G  52.1G 76  0  9.62M  0
tank 176G  52.1G 46167  5.74M  11.7M
tank 176G  52.1G 79  0  9.99M  0
tank 176G  52.1G 81  0  10.2M  0
tank 176G  52.1G 43164  5.43M  11.7M
tank 176G  52.1G 71  0  9.00M  0
tank 176G  52.1G 61 39  7.74M  5.00M
tank 176G  52.1G 46111  5.74M  9.17M
tank 176G  52.1G 71  0  8.99M  0
tank 176G  52.1G 80  0  10.1M  0
tank 176G  52.1G 47113  5.87M  9.68M
tank 176G  52.1G 70  0  8.87M  0
tank 176G  52.1G 78  0  9.80M  0
tank 176G  52.1G 42164  5.24M  11.3M
tank 176G  52.1G 76  0  9.62M  0
tank 176G  52.1G 79  0  9.99M  0
tank 176G  52.1G 49153  6.11M  10.8M
tank 176G  52.1G 72  0  9.12M  0

Fabian


signature.asc
Description: PGP signature

Re: ZFS MFC heads up

2009-05-21 Thread Fabian Keil

Pertti Kosunen pertti.kosu...@pp.nic.fi wrote:

 Kip Macy wrote:
  I will be MFC'ing the newer ZFS support some time this afternoon. Both
  world and kernel will need to be re-built. Existing pools will
  continue to work without upgrade.
 
 Mounting local file systems:.
 internal error: out of memory
 internal error: out of memory
 internal error: out of memory
 internal error: out of memory
 
 I get this in dmesg after make installkernel  shutdown -r now, zfs 
 pool is not mounted. /usr is on zfs so can't installworld.

IIRC, that's what happens if ZFS kernel and userland aren't in sync.

You'll either have to install the new kernel and userland
together (not supported, but I do it all the time), or install
the userland from a non-ZFS file system.

Fabian


signature.asc
Description: PGP signature

Re: Panic in radeon_get_vblank_counter()

2009-03-15 Thread Fabian Keil

Robert Noland rnol...@freebsd.org wrote:

 On Fri, 2009-03-13 at 23:33 -0500, Sean C. Farley wrote: 
  On Fri, 13 Mar 2009, Robert Noland wrote:

  If I start rebooting before it is printed, the system locks up.  Of 
  course, this is only after rebooting several times.
  
  Here is a successful start and shutdown:
  http://people.freebsd.org/~scf/drm-dmesg.log
  http://people.freebsd.org/~scf/Xorg.0.log
 
 Ok, I'll spend some time staring at the current code... Thanks for the
 backtrace too, it's nice to get those...

This seems to be the same panic I mentioned in the
Filesystems being eaten? thread on freebsd-current.

I reproducible got this panic on:
FreeBSD 8.0-CURRENT #39: Sat Mar  7 20:37:29 CET 2009
when shutting Xorg down. I can no longer reproduce it with:
FreeBSD 8.0-CURRENT #42: Sat Mar 14 00:47:09 CET 2009

Fabian


signature.asc
Description: PGP signature

Re: non-root user can not create zfs filesystem?

2008-10-22 Thread Fabian Keil

Pete French [EMAIL PROTECTED] wrote:

  Yes,that's is what I want to say.
  In other word is the command zfs allow and zfs unallow
  I think it is not Support chflags(2) which is described in at the
  bottom of http://wiki.freebsd.org/ZFS
 
 Sorry, my unclear use of english! I didn't mean the last item, I meant
 that it was near the bottom of the page. Look at the line above the
 'chflags' one - Delegated Administration is what you are after. Not
 here yet, but hopefully soon...

You can already test it on CURRENT if you apply the
patch Pawel posted on freebsd-fs@ and freebsd-current@
a while ago.

Fabian


signature.asc
Description: PGP signature

Re: constant zfs data corruption

2008-10-20 Thread Fabian Keil

JoaoBR [EMAIL PROTECTED] wrote:

 On Monday 20 October 2008 11:22:08 you wrote:
  On Mon, Oct 20, 2008 at 08:37:40AM -0200, JoaoBR wrote:
   On Friday 17 October 2008 15:39:59 Chuck Swiger wrote:
On Oct 17, 2008, at 11:30 AM, JoaoBR wrote:
 constantly I find data corruption on ZFS volums, ever from
 rrdtool, this
 corrupt data happens on SATA disks, never seem on SCSI
   
Presumably your SATA drives are correctly being reported by ZFS as
corrupting data, and you should do something like replace cables,
the drives themselves, perhaps try downgrading to SATA-150 rather
than -300 if you are using the later.  Also consider running a
drive diagnostic utility from the mfgr (or smartmontools) and
doing an extended self-test or destructive write surface check.
  
   well, hardware seems to be ok and not older than 6 month, also
   happens not only on one machine ... smartctl do not report any hw
   failures on disk
  
   regarding jumpering the drives to 150 you suspect a driver problem?
 
  It's not because of a driver problem.  There are known SATA chipsets
  which do not properly work with SATA300 (particularly VIA and SiS
  chipsets); they claim to support it, but data is occasionally
  corrupted. Capping the drive to SATA150 fixes this problem.
 
  http://en.wikipedia.org/wiki/Serial_ATA#SATA_1.5_Gbit.2Fs_and_SATA_3_Gbit.2
 Fs
 
  There are also known problems with Silicon Image chipsets (on Linux,
  Windows, and FreeBSD).
 
  Because you didn't provide your smartctl output, I can't really tell if
  the drives are in good shape or not.  :-)
 
 
 ok then here it comes
 
 smartctl version 5.38 [amd64-portbld-freebsd7.0] Copyright (C) 2002-8

Can you reproduce the problem on a i386 system?

I have a USB HD case that works fine on a i386 system but
writing from an amd64 system leads to ZFS checksum errors
(reading works though).

Fabian


signature.asc
Description: PGP signature

Re: constant zfs data corruption

2008-10-20 Thread Fabian Keil

Jeremy Chadwick [EMAIL PROTECTED] wrote:

 On Mon, Oct 20, 2008 at 03:07:30PM -0200, JoaoBR wrote:
  On Monday 20 October 2008 11:22:08 you wrote:

   Also, do you not think it's a little odd that the only data
   corruption occurring for you are related to RRDtool?
  
  this yes I think is suspitious
 
 Chuck's probably spot-on with regards to explaining why this is.
 Something to keep in mind is that RRDtool has a history of bugs, so I
 wouldn't be surprised if the issue turned out to be there.  It's really
 too bad we have no decent, actively-maintained alternatives to RRDtool.

Bugs in RRDtool shouldn't cause ZFS data corruption.

Fabian


signature.asc
Description: PGP signature

Re: GELI encrypted ZFS zpool

2008-09-20 Thread Fabian Keil

Steve Bertrand [EMAIL PROTECTED] wrote:

 I have an older storage box that I've upgraded to -stable. It currently
 uses 7 SCSI disks mashed together with gstripe.
 
 I've recently replaced this box with a new one running a ZFS setup. I'm
 now wanting to turn the old one into a storage device running ZFS, but I
 want the entire pool encrypted with GELI.
 
 I know I can do this, but my requirements are as such:
 
 - use a key on external media to access the GELI encrypted disks
 - not have to type in the passphrase for each physical disk
 
 ...is this possible?

It should be possible if you use keyfiles without password
for the vdevs and store those keyfiles on a geli encrypted
slice that uses both a keyfile and a passphrase.

Fabian


signature.asc
Description: PGP signature

Re: possible zfs bug? lost all pools

2008-05-19 Thread Fabian Keil

JoaoBR [EMAIL PROTECTED] wrote:

 man page thar zfs can not be a dump device, not sure if I understand it
 as meant but I can dump to zfs very well and fast as long as
 recordsize=128

I assume you tried dump(8), while the sentence in the man
page is about using a ZFS volume as dumpon(8) target:

%sudo dumpon -v /dev/zvol/tank/swap 
dumpon: ioctl(DIOCSKERNELDUMP): Operation not supported

Fabian


signature.asc
Description: PGP signature

Re: crash in acd_geom_detach() whilst reading vcd

2007-09-11 Thread Fabian Keil

Peter Jeremy [EMAIL PROTECTED] wrote:

 I was trying to play a VCD (using mplayer) on my 6-STABLE system and
 it runs for a while and then crashes.  This is reproducable with the
 same traceback.
 
 kgdb reports:
 acd0: FAILURE - device detached
 
 Fatal trap 12: page fault while in kernel mode
 fault virtual address   = 0x3c8
 fault code  = supervisor read data, page not present
 instruction pointer = 0x8:0x801b6489
 stack pointer   = 0x10:0xa3561ba0
 frame pointer   = 0x10:0xa3561bc0
 code segment= base 0x0, limit 0xf, type 0x1b
 = DPL 0, pres 1, long 1, def32 0, gran 1
 processor eflags= interrupt enabled, resume, IOPL = 0
 current process = 2 (g_event)
 trap number = 12
 panic: page fault
 KDB: stack backtrace:
 panic() at panic+0x1c1
 trap_fatal() at trap_fatal+0x298
 trap_pfault() at trap_pfault+0x243
 trap() at trap+0x298
 calltrap() at calltrap+0x5
 --- trap 0xc, rip = 0x801b6489, rsp = 0xa3561ba0, rbp = 
 0xa3561bc0 ---
 acd_geom_detach() at acd_geom_detach+0x19
 g_run_events() at g_run_events+0x1b7
 g_event_procbody() at g_event_procbody+0x5a
 fork_exit() at fork_exit+0x87
 fork_trampoline() at fork_trampoline+0xe
 
 A gdb backtrace shows:
 #6  0x803787bb in calltrap () at 
 /usr/src/sys/amd64/amd64/exception.S:168
 #7  0x801b6489 in acd_geom_detach (arg=0xff7e1100, flag=0x0) 
 at /usr/src/sys/dev/ata/atapi-cd.c:194
 #8  0x8022f267 in g_run_events () at 
 /usr/src/sys/geom/geom_event.c:209
 #9  0x802305ca in g_event_procbody () at 
 /usr/src/sys/geom/geom_kern.c:141
 #10 0x80254f77 in fork_exit (callout=0x80230570 
 g_event_procbody, arg=0x0, frame=0xff0039dc4770)
 at /usr/src/sys/kern/kern_fork.c:821
 #11 0x80378b1e in fork_trampoline () at 
 /usr/src/sys/amd64/amd64/exception.S:394
 
 The argument to acd_geom_detach() does include a NULL ivars:
 (kgdb) p *(device_t)0xff7e1100
 $2 = {
   ops = 0xff825000, 
   link = {
 tqe_next = 0xff7c1c00, 
 tqe_prev = 0xff8ea130
   }, 
   devlink = {
 tqe_next = 0xff7c1c00, 
 tqe_prev = 0xff9f1518
   }, 
   parent = 0xff8ea100, 
   children = {
 tqh_first = 0x0, 
 tqh_last = 0xff7e1130
   }, 
   driver = 0x80532220, 
   devclass = 0xff7ebe00, 
   unit = 0x0, 
   nameunit = 0xff9d19d0 acd0, 
   desc = 0xff0039bd72a0 TSSTcorpCD/DVDW TS-L532M/HR08, 
   busy = 0x0, 
   state = DS_ATTACHED, 
   devflags = 0x0, 
   flags = 0x5d, 
   order = 0x0, 
   pad = 0x0, 
   ivars = 0x0, 
   softc = 0xffacac00, 
   sysctl_ctx = {
 tqh_first = 0xff0039bd7120, 
 tqh_last = 0xff0039bd7228
   }, 
   sysctl_tree = 0xffb30600
 }
 (kgdb) 
 
 Is this behaviour expected?

I think you're running into the same problem I reported in
kern/99017: [ata] [patch] FreeBSD versions above 5.3
panic if atapi drives become unresponsive:
http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/99017

You could try the work-around, but your drive will
probably still be lost until the next reboot.

Fabian


signature.asc
Description: PGP signature

Re: release cycle

2007-06-02 Thread Fabian Keil

Chris [EMAIL PROTECTED] wrote:

 On 29/05/07, Mark Linimon [EMAIL PROTECTED] wrote:
  On Tue, May 29, 2007 at 09:17:57PM +1000, Peter Jeremy wrote:
   Agreed.  6.3-RELEASE would nominally be due around July but the lack
   of any schedule on http://www.freebsd.org/releng/ suggests that it will
   be later than that.  The plans to start the 7.0-RELEASE cycle will also
   impact this.
 
  At BSDCan, Ken Smith mentioned that 7.0 is due to be branched in July and
  released in Aug/Sep, with 6.3 quickly following (perhaps even overlapping
  so as to reuse the same ports freeze).
 
  The ports tree is not even close to stable enough to release right now.

 Given that Kris repeatedly tells me and others that the ports system
 is only supported on the latest freebsd release (meaning one has to be
 upgrading freebsd on their servers every few months to get this
 support) if 7.0 and 6.3 are released around the same time will the
 ports tree be supported on both?

I believe you misunderstood something. Where do you think Kris said that?

Fabian


signature.asc
Description: PGP signature

Re: 6.2-RELEASE panic when blanking CD-RW media

2007-02-22 Thread Fabian Keil

Petr Holub [EMAIL PROTECTED] wrote:

 I've encountered a deterministic kernel panic when
 blanking one specific CD-RW media using cdrecord.

 The kernel panic details follow and dmesg is at the end of this
 email. Though I understand there's something wrong with the media,
 I think it shouldn't panic the kernel either.

 # kgdb /boot/kernel/kernel vmcore.9

 (kgdb) bt
 #0  0xc067262e in doadump ()
 #1  0xc0672afe in boot ()
 #2  0xc0672d94 in panic ()
 #3  0xc0885a04 in trap_fatal ()
 #4  0xc088576b in trap_pfault ()
 #5  0xc08853a9 in trap ()
 #6  0xc0873a7a in calltrap ()
 #7  0xc04e5e2e in acd_geom_detach ()
 #8  0xc06388f9 in one_event ()
 #9  0xc06389d1 in g_run_events ()
 #10 0xc0639de5 in g_event_procbody ()
 #11 0xc065cd34 in fork_exit ()
 #12 0xc0873adc in fork_trampoline ()
 (kgdb) x 0xc067262e
 0xc04e5e2e acd_geom_detach+18:  0x03b0b0ff
 (kgdb) q

You could give the following patch a try: 
http://www.fabiankeil.de/sourcecode/freebsd/atapi-cd.c.patch

It prevents a panic if a disc drive gets lost
without FreeBSD noticing it (for example because
the firmware is buggy).

See:
http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/99017
for details.

Note that the patch simply prevents the panic,
your drive will probably still be lost until
you reboot.

Fabian


signature.asc
Description: PGP signature

Re: Can't build threaded perl 5.8 on 6.2-RELEASE and 7-CURRENT

2007-02-08 Thread Fabian Keil

LI Xin [EMAIL PROTECTED] wrote:

 It seems that threaded perl is broken on 6.2-RELEASE and 7-CURRENT.  I
 have tried some option combinations with no luck, if WITH_THREADED=yes
 is specified then the build would fail with a coredump.

 Any hints?

I ran into the same miniperl core dumps a few days ago
while trying to switch back to non-threaded Perl (shortly
after updating the system to a recent RELENG_6).

The only way I found to fix it was to:

- deinstall all Perl ports,
- rebuild Perl
- reinstall all Perl ports

I assume miniperl somehow included incompatible local
Perl libraries, but I didn't really look into it.

Fabian


signature.asc
Description: PGP signature

Re: Is there any good reason for getby_r()?

2006-09-26 Thread Fabian Keil

Mark Andrews [EMAIL PROTECTED] wrote:

   get*by*_r() are deprecated on most platforms and there use
   is highly non-portable, lots of different API's.
 
   Why are we adding compatability for deprecated functions?

I was wondering the same thing, especially because it causes a lot
of packages that where compiled on later FreeBSD 6.x version not
to work on earlier FreeBSD 6.x versions.

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: 16M RAM enough for FreeBSD 6.1?

2006-08-28 Thread Fabian Keil

Torfinn Ingolfsen [EMAIL PROTECTED] wrote:

 On Sun, 27 Aug 2006 18:13:12 +0200
 Fabian Keil [EMAIL PROTECTED] wrote:
 
 For information: I'm still trying to find a sodimm card for this
 machine, as everything would be easier if it had more memory.
 We'll see how I manage that; here in Norway it is not so easy to find
 things like that, and transport costs from the US are prohibitive for
 a hobby budget.
 
  I moved the harddisk into a more powerful machine,
  installed FreeBSD there, build a lighter kernel and
  put the disk back.
 
 Are there any FAQ's arounf for things I can safely remove from a 6.1
 kernel?

I don't think so, but usually the comments are enough to decide
if you need something or not. The man pages help with the rest.

  In your case it's probably easier to create a disk image
  in Qemu, copy it to a CD and then use something that
 
 Hmm, I'm not very familiar with Qemu. A quick web search didn't turn up
 any obvious pointers on how to create a ISO image from a qemu image, or
 how to make an ISO image from the (currently running) Qemu image.

You can burn the Qemu image like every other file, you can even burn it
directly without putting it into an ISO file first. You should stop Qemu
first though, otherwise you might end up with an inconsistent image.

If you only want to replace a partition, you can load the image
with mdconfig to extract the partition you need.

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: 16M RAM enough for FreeBSD 6.1?

2006-08-27 Thread Fabian Keil

Torfinn Ingolfsen [EMAIL PROTECTED] wrote:

 I have an old laptop, a Compaq Armada 1580DMT, with 16M RAM, 2GB hd,
 floppy and CD-rom. It doesn't have built in networking, neither wired
 nor wireless. It does have PC card slots. It has had FreeBSD 4.9-release
 installed a long time, and was recently upgraded to 4.11-release from
 CD, sucessfully.

 However, when I try the 6.1-release CD (CD1), it boots as far as
 loading the kernel, botting the kernel, and then reboots again??

 Are 16 Megs of RAM to little to install FreeBSD 6.0 or newer?

With the default configuration yes.

I recently tried to install FreeBSD 6.1-PRERELEASE on a
Pentium 90 with 16 MB RAM, and hit the rebooting problem as well.

I moved the harddisk into a more powerful machine,
installed FreeBSD there, build a lighter kernel and
put the disk back.

NFS mounting needed a work around:
http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/94830
but the rest worked out of the box.

In your case it's probably easier to create a disk image
in Qemu, copy it to a CD and then use something that
boots from a floppy, supports the CD-Rom drive and brings
dd with it, to install the image. 

Depending on your partition layout you may even
be able to use your old FreeBSD installation to do that.
(I'm not sure if it's possible to use FreeBSD to overwrite
the partition it's running from).

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)

2006-07-27 Thread Fabian Keil

Fabian Keil [EMAIL PROTECTED] wrote:

 Fabian Keil [EMAIL PROTECTED] wrote:
 
  Peter Thoenen [EMAIL PROTECTED] wrote:
  
   To you have pf running? If so can you turn it off for a bit a see
   if you still crash.  On my box I was getting all sorts of witness
   kbd backtraces on pf and since turning pf off (maybe a week ago),
   haven't crashed yet.  Going to let it keep running unmetered for
   another 2 weeks and see if I crash or not.

  So far I didn't see a single PF related complaint from witness,
  but I'll try disabling PF in a few days anyway.
 
 It took a little longer than I thought, but I finally
 disabled PF today and switched to natd.

Uptime was slightly above 25 hours. Compiling HEAD right now. 

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)

2006-07-26 Thread Fabian Keil

Fabian Keil [EMAIL PROTECTED] wrote:

 Peter Thoenen [EMAIL PROTECTED] wrote:
 
  To you have pf running? If so can you turn it off for a bit a see if
  you still crash.  On my box I was getting all sorts of witness kbd
  backtraces on pf and since turning pf off (maybe a week ago),
  haven't crashed yet.  Going to let it keep running unmetered for
  another 2 weeks and see if I crash or not.

How is it going, Peter, still running?
 
 I'm running Tor jailed and use PF for NAT, port forwarding and
 filtering: http://tor.fabiankeil.de/pf-stats/
 
 So far I didn't see a single PF related complaint from witness,
 but I'll try disabling PF in a few days anyway.

It took a little longer than I thought, but I finally
disabled PF today and switched to natd.

 At the moment I'm still testing if enabling polling really
 increases the uptime.

I'm still not sure, however polling made it possible to
use fxp0 without acpi, the hangs still occur and the serial
console still becomes unresponsive though.

On another wild guess I switched Tor's threading library
from libpthread to libthr. While it doesn't seem
to affect the uptime, it makes Tor's cpu usage visible
in top, so maybe it would be a good default for tor-devel?

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)

2006-07-15 Thread Fabian Keil

Robert Watson [EMAIL PROTECTED] wrote:

 On Wed, 28 Jun 2006, Fabian Keil wrote:

  I just got:
 
  Jun 28 23:01:19 tor kernel: lock order reversal:
  Jun 28 23:01:19 tor kernel: 1st 0xc3795000 kqueue (kqueue) @ 
  /usr/src/sys/kern/kern_event.c:1053
  Jun 28 23:01:19 tor kernel: 2nd 0xc1043144 system map (system map) @ 
  /usr/src/sys/vm/

  Looks similar to http://sources.zabbadoz.net/freebsd/lor.html#185.
 
 Could you run vmstat -z, netstat -m, and vmstat -m please?

I enabled polling three days ago and saw this lor two times
since then. It may or may not be a coincidence.

I log:

top -S -d 2
pfctl -si
netstat -ss
sysctl -a
vmstat -z
netstat -m
vmstat -m 

every five minutes, the output before and after the lor
can be found at: http://www.fabiankeil.de/tmp/lor-185.txt

The system is still up at the moment, so the lor might
have nothing to do with the crashes/hangs/whatever.

I have the feeling that polling does increase the uptime,
but I'm not sure yet.

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)

2006-07-15 Thread Fabian Keil

Fabian Keil [EMAIL PROTECTED] wrote:

 Robert Watson [EMAIL PROTECTED] wrote:
 
  On Wed, 28 Jun 2006, Fabian Keil wrote:
 
   I just got:
  
   Jun 28 23:01:19 tor kernel: lock order reversal:
   Jun 28 23:01:19 tor kernel: 1st 0xc3795000 kqueue (kqueue) @ 
   /usr/src/sys/kern/kern_event.c:1053
   Jun 28 23:01:19 tor kernel: 2nd 0xc1043144 system map (system map) @ 
   /usr/src/sys/vm/
 
   Looks similar to http://sources.zabbadoz.net/freebsd/lor.html#185.
  
  Could you run vmstat -z, netstat -m, and vmstat -m please?
 
 I enabled polling three days ago and saw this lor two times
 since then. It may or may not be a coincidence.

 The system is still up at the moment, so the lor might
 have nothing to do with the crashes/hangs/whatever.

Actually I had to reset the box about two hours
ago, I just forgot and overlooked the few minutes
downtime in the logs.

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)

2006-07-15 Thread Fabian Keil

Peter Thoenen [EMAIL PROTECTED] wrote:

 To you have pf running? If so can you turn it off for a bit a see if
 you still crash.  On my box I was getting all sorts of witness kbd
 backtraces on pf and since turning pf off (maybe a week ago), haven't
 crashed yet.  Going to let it keep running unmetered for another 2
 weeks and see if I crash or not.

I'm running Tor jailed and use PF for NAT, port forwarding and filtering:
http://tor.fabiankeil.de/pf-stats/

So far I didn't see a single PF related complaint from witness,
but I'll try disabling PF in a few days anyway. At the moment
I'm still testing if enabling polling really increases the uptime.

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)

2006-07-07 Thread Fabian Keil

Fabian Keil [EMAIL PROTECTED] wrote:

 Fabian Keil [EMAIL PROTECTED] wrote:
 
  Robert Watson [EMAIL PROTECTED] wrote:
 
   It sounds like your serial console server may not know how to map
   SSH break signals into remote serial break signals.  Try
   ALT_BREAK_TO_DEBUGGER.  Here's the description from NOTES:
   
   # Solaris implements a new BREAK which is initiated by a character
   # sequence CR ~ ^b which is similar to a familiar pattern used on
   # Sun servers by the Remote Console.
   options ALT_BREAK_TO_DEBUGGER
  
  It took me several attempts to get the character sequence right,
  but yes, this one works. Thanks.
 
 Unfortunately it didn't work while the system was hanging
 this morning.

Since then I got one or two hangs a day and entering
the debugger never worked out, even if my console connection
was opened a few minutes before the hang.

I no longer think it has anything to do with the terminal
server, but assume the hang takes the console with it.

sio0 is running on acpi0, so I tried to disable acpi
to see if it changes anything, but the only change I
got was that fxp0 stopped working (it is up but only
produces timeout warnings).

I tried to partly disable acpi subsystems like
described in acpi(4), but either I got the
syntax wrong, or it just isn't working.

Can someone on this list confirm or deny if
something like debug.acpi.disabled=isa in
/boot/loader.conf makes sense?

That's how I understand the man page, but I don't see any
reaction. I also tried /etc/sysctl.conf (which probably
is parsed too late anyway) but I just got a message that the
sysctl does not exists.

sysctl debug.acpi indeed only shows:
debug.acpi.do_powerstate: 1
debug.acpi.acpi_ca_version: 0x20041119
debug.acpi.semaphore_debug: 0

so maybe I need some special acpi options
or it just doesn't work if acpi is loaded as a module,
but as least the man page has no such hints.

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)

2006-07-03 Thread Fabian Keil

Dan Nelson [EMAIL PROTECTED] wrote:

 In the last episode (Jul 02), Robert Watson said:
  On Sun, 2 Jul 2006, Fabian Keil wrote:
  The ssh man page offers:
  
  |~B  Send a BREAK to the remote system (only useful for SSH
  |protocol version 2 and if the peer supports it).
  
  I am using ssh 2, but the only reaction I get is a new line.
  
  |FreeBSD/i386 (tor.fabiankeil.de) (ttyd0)
  |
  |login: ~B
 
 If you enter ~B and actually see a ~B printed to the screen, then ssh
 didn't process it because you didn't hit cr first.  So cr~B will
 tell ssh to send a break.

I am actually using cr~B and I don't see just ~B,
but ~B
. The tilde is printed after I release B, therefore I
guess it is working.
 
  It sounds like your serial console server may not know how to map
  SSH break signals into remote serial break signals.  Try
  ALT_BREAK_TO_DEBUGGER.  Here's the description from NOTES:
  
  # Solaris implements a new BREAK which is initiated by a character
  # sequence CR ~ ^b which is similar to a familiar pattern used on
  # Sun servers by the Remote Console.
  options ALT_BREAK_TO_DEBUGGER
 
 ... and if you're sshing to your terminal server, remember that ssh
 will eat that tilde (because you sent cr~ ), so you need to send
 cr~~^B to pass the right characters to FreeBSD.  Or change ssh's
 escape character with the -e flag.

cr~^b works for me, without touching any ssh settings.
As cr~. is still causing a disconnect, it doesn't look
like the escape character was changed either.

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)

2006-07-03 Thread Fabian Keil

Fabian Keil [EMAIL PROTECTED] wrote:

 Robert Watson [EMAIL PROTECTED] wrote:

  It sounds like your serial console server may not know how to map
  SSH break signals into remote serial break signals.  Try
  ALT_BREAK_TO_DEBUGGER.  Here's the description from NOTES:
  
  # Solaris implements a new BREAK which is initiated by a character
  # sequence CR ~ ^b which is similar to a familiar pattern used on
  # Sun servers by the Remote Console.
  options ALT_BREAK_TO_DEBUGGER
 
 It took me several attempts to get the character sequence right,
 but yes, this one works. Thanks.

Unfortunately it didn't work while the system was hanging
this morning. I wasn't logged in at the console before the
hang occurred, so it maybe that the terminal server checked
the console for life signs, found none and did neither
connect nor print a warning (wild guess I have no idea
if it does that).

It could also mean that I'm seeing the mysterious power off part
described in: http://www.freebsd.org/cgi/query-pr.cgi?pr=95180
but I have no way to tell the difference.

I will stay connected to the console until the system hangs
again to see if it changes anything.

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)

2006-07-02 Thread Fabian Keil

Robert Watson [EMAIL PROTECTED] wrote:

 On Tue, 27 Jun 2006, Fabian Keil wrote:
 
  There was a request for Tor related problem reports a while ago,
  I couldn't find the message again, but I believe it was posted here.
 
 I'm very interested in tracking down this problem, but have had a lot
 of trouble getting reliable reports of problems -- i.e., ones where I
 could get any debugging information.  I had a similar conversation on
 these lines yeterday with Roger (Tor author) here at the WEIS
 conference.  If this is easily reproduceable, I would like you to do
 the following:

 - Does the hang occur?  If so, use a serial break to get into DDB,
 see the above.

I previously had the serial console misconfigured and I'm still not
sure if the settings are correct now.

So far I put BOOT_COMCONSOLE_SPEED=57600 in /etc/make.conf,
options CONSPEED=57600 in the kernel and console=comconsole
in /boot/loader.conf. Kernel and bootblock were recompiled
and reinstalled. /boot.config contains the line:
-D -h -S57600 (speed setting through make.conf didn't work).

The boot process now starts with:

PXELINUX 3.11 2005-09-02  Copyright (C) 1994-2005 H. Peter Anvin
Booting from local disk...

1   Linux
2   FreeBSD
3   FreeBSD

Default: 2 

/boot.config: -DConsoles: internal video/keyboard  serial port  
BIOS drive C: is disk0
BIOS 639kB/523200kB available memory

FreeBSD/i386 bootstrap loader, Revision 1.1
[...]

After manually triggering a test panic through debug.kdb.enter
I could enter ddb and everything seemed to be working.

However today I got another hang and couldn't enter the debugger
by sending BREAK. It is the same BREAK ssh sends with ~B, right?

Even after rebooting, sending break didn't trigger a panic,
so either I'm sending the wrong BREAK, or my console settings
are still messed up. Any ideas?

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)

2006-07-02 Thread Fabian Keil

Robert Watson [EMAIL PROTECTED] wrote:

 On Sun, 2 Jul 2006, Fabian Keil wrote:

  After manually triggering a test panic through debug.kdb.enter I
  could enter ddb and everything seemed to be working.
 
  However today I got another hang and couldn't enter the debugger by
  sending BREAK. It is the same BREAK ssh sends with ~B, right?
 
  Even after rebooting, sending break didn't trigger a panic, so
  either I'm sending the wrong BREAK, or my console settings are
  still messed up. Any ideas?
 
 What serial software are you using to reach the console?

I use ssh to log in to a console server, hit enter and
am connected to the console. I have no idea what kind
of software is used between console server and console.

 Do you have options BREAK_TO_DEBUGGER compiled into your kernel?

Yes, together with the other options you suggested:

makeoptions DEBUG=-g
options DDB
#options KDB_UNATTENDED
options KDB
options BREAK_TO_DEBUGGER
options WITNESS
options WITNESS_SKIPSPIN
options INVARIANTS
options INVARIANT_SUPPORT

 The delivery mechanism for the break will depend on the software
 you're using...

The ssh man page offers:

|~B  Send a BREAK to the remote system (only useful for SSH protocol
|version 2 and if the peer supports it).

I am using ssh 2, but the only reaction I get is a new line.

|FreeBSD/i386 (tor.fabiankeil.de) (ttyd0)
|
|login: ~B
|

Maybe machdep.enable_panic_key would be another solution?
The description says Enable panic via keypress
specified in kbdmap(5), I'm just not sure if console
input qualifies as keypress.

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)

2006-07-02 Thread Fabian Keil

Robert Watson [EMAIL PROTECTED] wrote:

 On Sun, 2 Jul 2006, Fabian Keil wrote:

  I am using ssh 2, but the only reaction I get is a new line.
 
  |FreeBSD/i386 (tor.fabiankeil.de) (ttyd0)
  |
  |login: ~B
  |
 
 It sounds like your serial console server may not know how to map SSH
 break signals into remote serial break signals.  Try
 ALT_BREAK_TO_DEBUGGER.  Here's the description from NOTES:
 
 # Solaris implements a new BREAK which is initiated by a character
 # sequence CR ~ ^b which is similar to a familiar pattern used on
 # Sun servers by the Remote Console.
 options ALT_BREAK_TO_DEBUGGER

It took me several attempts to get the character sequence right,
but yes, this one works. Thanks.

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)

2006-06-29 Thread Fabian Keil

Robert Watson [EMAIL PROTECTED] wrote:

 On Thu, 29 Jun 2006, Fabian Keil wrote:
 
  I wish I could. The machine died before I read your message.
 
  I was logged in on the serial console running tail
  -f /var/log/messages. Last messages were:
 
  Jun 29 00:42:20 tor kernel: Memory modified after free
  0xc4275000(2048) val=a020c0de @ 0xc4275000 Jun 29 00:42:20 tor
  kernel: Memory modified after free 0xc4055800(2048) val=a020c0de @

  0xc432a000 Jun 29 00:42:24 tor kernel: ad0: TIMEOUT - WRITE_DMA
  retrying (1 retry left) LBA=34263674 Jun 29 00:42:24 tor kernel:
  Memory modified after free 0xc3dff800(2048) val=a020c0d
 
  Ctrl+Alt+ESC didn't trigger any reaction, so I caused a reset
  through the ISP's webinterface. Now the system appears to be hosed,
  at least FreeBSD never reaches the login:
 
  PXELINUX 3.11 2005-09-02  Copyright (C) 1994-2005 H. Peter Anvin
  Booting from local disk...
 
  1   Linux
  2   FreeBSD
  3   FreeBSD
 
  Default: 2
 
  [nothing]

 The ATA error above is a bit distressing, as is the fact that it
 won't boot. Is [nothing] normally the FreeBSD boot loader rather
 than nothing?

The 1 Linux ... part already is the FreeBSD boot loader.
Normally it goes:

PXELINUX 3.11 2005-09-02  Copyright (C) 1994-2005 H. Peter Anvin
Booting from local disk...

1   Linux
2   FreeBSD
3   FreeBSD

Default: 2 

FreeBSD/i386 (tor.fabiankeil.de) (ttyd0)

login:

 I would suggest running some hardware diagnostics to
 make sure we're dealing with reliable hardware before continuing so
 that we're not chasing both hardware and software problems, since you
 can't reliably debug software problems in the presence of hardware
 failures.

I'll see what the ports collection has to offer (running
smartmontools right now) but so far it's the only ATA message I got.

  Probably something which would be easy to resolve with keyboard
  access and a screen, but I think I'm forced to use the
  RecoveryManager. Unfortunately recovery means reinstalling the
  preconfigured GNU/Linux which I than can replace with FreeBSD
  again. If there ever was a core dump it will be gone, and so will
  be kernel.debug.

Lucky me. The RecoveryManager turned out to be a full featured
PXE-booted GNU/Linux system. It allowed me to fetch and replace
/dev/ad0s2a (/) through ssh. The system is online again. 

After fsck -y /dev/ad0s3d (/usr) the whole tor jail is gone,
but the rest of this slice seems to be ok, including kernel.debug.

I can't fsck /var:
[EMAIL PROTECTED] ~]$ sudo fsck /dev/ad0s3d
** /dev/ad0s3d
** Last Mounted on /var
** Phase 1 - Check Blocks and Sizes
fsck_4.2bsd: cannot alloc 1082190976 bytes for inoinfo

but it can still be mounted. No core dump though.

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)

2006-06-28 Thread Fabian Keil

Robert Watson [EMAIL PROTECTED] wrote:

 On Tue, 27 Jun 2006, Fabian Keil wrote:
 
  There was a request for Tor related problem reports a while ago,
  I couldn't find the message again, but I believe it was posted here.
 
 I'm very interested in tracking down this problem, but have had a lot
 of trouble getting reliable reports of problems -- i.e., ones where I
 could get any debugging information.  I had a similar conversation on
 these lines yeterday with Roger (Tor author) here at the WEIS
 conference.  If this is easily reproduceable, I would like you to do
 the following:
 
 - Compile in options DDB, options KDB, options BREAK_TO_DEBUGGER,
 options WITNESS, options WITNESS_SKIPSPIN, options INVARIANTS, options
INVARIANT_SUPPORT.
 
 - Make sure to have a kernel with debugging symbols for the kernel.
 
 - Turn on core dumps.

Done. I expect to get a chance to test the settings in the next 24 hours.
 
 The above debugging options will have a significant performance
 impact, and may or may not affect the probability of the race or
 deadlock being exercised. The first question is:
 
 - Are there any warnings on the console from WITNESS or other
 debugging options?  If so, please copy/paste them into an e-mail for
 me.

So far the logs show nothing unusual, but I
noticed that the ssh connection gets unresponsive
from time to time.

I did a few pings with interesting results:

[EMAIL PROTECTED] ~]$ ping 10.0.0.1 | grep 'time=[^0]'
64 bytes from 10.0.0.1: icmp_seq=25 ttl=64 time=1.104 ms
64 bytes from 10.0.0.1: icmp_seq=61 ttl=64 time=2.983 ms
64 bytes from 10.0.0.1: icmp_seq=167 ttl=64 time=1.112 ms
64 bytes from 10.0.0.1: icmp_seq=189 ttl=64 time=1.653 ms
64 bytes from 10.0.0.1: icmp_seq=222 ttl=64 time=1.748 ms
64 bytes from 10.0.0.1: icmp_seq=291 ttl=64 time=1.058 ms
64 bytes from 10.0.0.1: icmp_seq=334 ttl=64 time=1.020 ms
64 bytes from 10.0.0.1: icmp_seq=337 ttl=64 time=1.967 ms
64 bytes from 10.0.0.1: icmp_seq=562 ttl=64 time=1.027 ms
64 bytes from 10.0.0.1: icmp_seq=586 ttl=64 time=1.230 ms
[EMAIL PROTECTED] ~]$ ping tor.fabiankeil.de | grep 'time=[^0]'
64 bytes from 81.169.155.246: icmp_seq=70 ttl=64 time=1.920 ms
64 bytes from 81.169.155.246: icmp_seq=79 ttl=64 time=1.587 ms
64 bytes from 81.169.155.246: icmp_seq=402 ttl=64 time=1.062 ms
[EMAIL PROTECTED] ~]$ ping localhost | grep 'time=[^0]'
64 bytes from 127.0.0.1: icmp_seq=142 ttl=64 time=1.142 ms
64 bytes from 127.0.0.1: icmp_seq=497 ttl=64 time=1.227 ms
64 bytes from 127.0.0.1: icmp_seq=627 ttl=64 time=1.181 ms

10.0.0.1 is on lo1, 81.169.155.246 is on fxp0, both
are filtered with pf. lo0 is skipped. The pings were run
locally while tor was running, the usual ping response times
are below 0.2 ms.

I get even more obscene ping times if I ping
from home, but my net connection isn't the best.
I'd appreciate if someone with a reliable net
connection could confirm the weirdness.

Thanks for your time, Robert, I hope to have real
information by tomorrow.

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)

2006-06-28 Thread Fabian Keil

Robert Watson [EMAIL PROTECTED] wrote:

 - Are there any warnings on the console from WITNESS or other
 debugging options?

I just got:

Jun 28 23:01:19 tor kernel: lock order reversal:
Jun 28 23:01:19 tor kernel: 1st 0xc3795000 kqueue (kqueue) @ 
/usr/src/sys/kern/kern_event.c:1053
Jun 28 23:01:19 tor kernel: 2nd 0xc1043144 system map (system map) @ 
/usr/src/sys/vm/vm_map.c:2390
Jun 28 23:01:20 tor kernel: KDB: stack backtrace:
Jun 28 23:01:20 tor kernel: 
kdb_backtrace(0,,c0711af0,c0713440,c06db624) at kdb_backtrace+0x29
Jun 28 23:01:20 tor kernel: witness_checkorder(c1043144,9,c06b90a8,956) at 
witness_checkorder+0x578
Jun 28 23:01:20 tor kernel: _mtx_lock_flags(c1043144,0,c06b90a8,956) at 
_mtx_lock_flags+0x5b
Jun 28 23:01:20 tor kernel: _vm_map_lock(c10430c0,c06b90a8,956) at 
_vm_map_lock+0x26
Jun 28 23:01:20 tor kernel: 
vm_map_remove(c10430c0,c3bc6000,c3bc8000,d6f55b30,c0623361) at 
vm_map_remove+0x1f
Jun 28 23:01:20 tor kernel: kmem_free(c10430c0,c3bc6000,2000,d6f55b48,c062524f) 
at kmem_free+0x25
Jun 28 23:01:20 tor kernel: page_free(c3bc6000,2000,22,2000,d6f55b60) at 
page_free+0x29
Jun 28 23:01:20 tor kernel: uma_large_free(c3ba5140) at uma_large_free+0x7b
Jun 28 23:01:20 tor kernel: free(c3bc6000,c06d8980,c3bc6000,c483,1400) at 
free+0xc5
Jun 28 23:01:20 tor kernel: kqueue_expand(c3795000,c06d8a40,500,0) at 
kqueue_expand+0xd7
Jun 28 23:01:20 tor kernel: kqueue_register(c3795000,d6f55bf4,c3a8f480,1,0) at 
kqueue_register+0x1b8
Jun 28 23:01:20 tor kernel: kern_kevent(c3a8f480,3,19,200,d6f55cc8) at 
kern_kevent+0xc9
Jun 28 23:01:20 tor kernel: kevent(c3a8f480,d6f55d04,6,2,212) at kevent+0x55
Jun 28 23:01:20 tor kernel: syscall(2824003b,80e003b,bfbf003b,cb87000,80d5020) 
at syscall+0x22f
Jun 28 23:01:20 tor kernel: Xint0x80_syscall() at Xint0x80_syscall+0x1f
Jun 28 23:01:20 tor kernel: --- syscall (363, FreeBSD ELF32, kevent), eip = 
0x282cc4af, esp = 0xbfbfe9fc, ebp = 0xbfbfea48 ---

Looks similar to http://sources.zabbadoz.net/freebsd/lor.html#185.

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: FreeBSD 6.1 Tor issues (Once More, with Feeling)

2006-06-28 Thread Fabian Keil

Robert Watson [EMAIL PROTECTED] wrote:

 On Wed, 28 Jun 2006, Fabian Keil wrote:
 
  Robert Watson [EMAIL PROTECTED] wrote:
 
  - Are there any warnings on the console from WITNESS or other
  debugging options?
 
  I just got:
 
  Jun 28 23:01:19 tor kernel: lock order reversal:
  Jun 28 23:01:19 tor kernel: 1st 0xc3795000 kqueue (kqueue)

  Looks similar to http://sources.zabbadoz.net/freebsd/lor.html#185.
 
 Could you run vmstat -z, netstat -m, and vmstat -m please?

I wish I could. The machine died before I read your message.

I was logged in on the serial console running tail -f /var/log/messages.
Last messages were:

Jun 29 00:42:20 tor kernel: Memory modified after free 0xc4275000(2048) 
val=a020c0de @ 0xc4275000
Jun 29 00:42:20 tor kernel: Memory modified after free 0xc4055800(2048) 
val=a020c0de @ 0xc4055800
Jun 29 00:42:20 tor kernel: Memory modified after free 0xc4ca(2048) 
val=a020c0de @ 0xc4ca
Jun 29 00:42:20 tor kernel: Memory modified after free 0xc39ef000(2048) 
val=a020c0de @ 0xc39ef000
Jun 29 00:42:24 tor kernel: Memory modified after free 0xc4bd7000(2048) 
val=a020c0de @ 0xc4bd7000
Jun 29 00:42:24 tor kernel: Memory modified after free 0xc3c8a000(2048) 
val=a020c0de @ 0xc3c8a000
Jun 29 00:42:24 tor kernel: Memory modified after free 0xc33bd000(2048) 
val=a020c0de @ 0xc33bd000
Jun 29 00:42:24 tor kernel: Memory modified after free 0xc3f1d000(2048) 
val=a020c0de @ 0xc3f1d000
Jun 29 00:42:24 tor kernel: Memory modified after free 0xc45dc800(2048) 
val=a020c0de @ 0xc45dc800
Jun 29 00:42:24 tor kernel: Memory modified after free 0xc429e000(2048) 
val=a020c0de @ 0xc429e000
Jun 29 00:42:24 tor kernel: Memory modified after free 0xc3aef800(2048) 
val=a020c0de @ 0xc3aef800
Jun 29 00:42:24 tor kernel: Memory modified after free 0xc432a000(2048) 
val=a020c0de @ 0xc432a000
Jun 29 00:42:24 tor kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) 
LBA=34263674
Jun 29 00:42:24 tor kernel: Memory modified after free 0xc3dff800(2048) 
val=a020c0d

Ctrl+Alt+ESC didn't trigger any reaction, so I caused a reset through
the ISP's webinterface. Now the system appears to be hosed, at least
FreeBSD never reaches the login:
   
PXELINUX 3.11 2005-09-02  Copyright (C) 1994-2005 H. Peter Anvin
Booting from local disk...

1   Linux
2   FreeBSD
3   FreeBSD

Default: 2 

[nothing]

Probably something which would be easy to resolve with
keyboard access and a screen, but I think I'm forced to use
the RecoveryManager. Unfortunately recovery means reinstalling
the preconfigured GNU/Linux which I than can replace with FreeBSD
again. If there ever was a core dump it will be gone, and so will
be kernel.debug.

On the bright side you can chose the OS to go with.
Should I use Current to see if the problem still exists?

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

FreeBSD 6.1 Tor issues (Once More, with Feeling)

2006-06-27 Thread Fabian Keil

There was a request for Tor related problem reports
a while ago, I couldn't find the message again, but I
believe it was posted here.

Last week I installed:
FreeBSD tor.fabiankeil.de 6.1-RELEASE-p2 FreeBSD
6.1-RELEASE-p2 #0: Fri Jun 23 20:06:57 CEST 2006
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/BIGSLEEP  i386.

At the moment it is only acting as Tor node
http://serifos.eecs.harvard.edu/cgi-bin/desc.pl?q=zwiebelsuppe
tor-devel (maintainer CC'd) is running jailed in a Geli image,
ntpd, named, cron and sshd are running in the host system
and that's about it. No mail or web server and nearly no traffic
besides the one caused by Tor.

I started Tor Friday night and had to reset the box three times
since then. The server just suddenly stops responding, the logs
stop as well, therefore I assume it either panics or hangs.

I only have remote access, a serial console is available,
but it becomes unresponsive as well. I didn't configure DDB yet,
so maybe that is to be expected?

cron creates some stats every five minutes, a few minutes
before a hang this morning the load was:

last pid:  7996;  load averages:  0.40,  0.37,  0.36  up 0+18:38:2505:55:02
83 processes:  2 running, 66 sleeping, 15 waiting
CPU states: 21.3% user,  0.0% nice, 17.8% system, 20.2% interrupt, 40.7% idle
Mem: 100M Active, 157M Inact, 102M Wired, 12K Cache, 60M Buf, 134M Free
Swap: 1024M Total, 1024M Free

  PID USERNAME  THR PRI NICE   SIZERES STATETIME   WCPU COMMAND
   11 root1 171   52 0K 8K RUN857:30 53.61% idle
   12 root1 -44 -163 0K 8K WAIT45:22  6.54% swi1: net
   23 root1 -68 -187 0K 8K WAIT14:48  2.83% irq12: fxp0 fxp1
 7973 root1  960  2264K  1544K RUN  0:00  0.51% top
   13 root1 -32 -151 0K 8K WAIT 5:49  0.10% swi4: clock sio
   33 root1 171   52 0K 8K pgzero   0:02  0.10% pagezero
3 root1  -80 0K 8K -0:16  0.05% g_up
 1586 _tor   14  20099M 97912K kserel 188:36  0.00% tor
   15 root1 -160 0K 8K -1:01  0.00% yarrow
 1443 root1  -80 0K 8K geli:w   0:49  0.00% g_eli[0] md0
4 root1  -80 0K 8K -0:21  0.00% g_down
   35 root1  200 0K 8K syncer   0:17  0.00% syncer
 1439 root1  -80 0K 8K mdwait   0:13  0.00% md0
   24 root1 -64 -183 0K 8K WAIT 0:08  0.00% irq14: ata0
2 root1  -80 0K 8K -0:07  0.00% g_event
   42 root1 -160 0K 8K -0:06  0.00% schedcpu
  453 root1  960  2920K  1752K select   0:05  0.00% ntpd
  256 _pflogd 1 -580  1548K  1216K bpf  0:05  0.00% pflog

pfctls -si:
Status: Enabled for 0 days 18:37:52   Debug: Urgent

Hostid: 0x1ec3da6b

Interface Stats for fxp0  IPv4 IPv6
  Bytes In 250778591590
  Bytes Out274988633620
  Packets In
Passed361927600
Blocked  322130
  Packets Out
Passed368714320
Blocked2650

State Table  Total Rate
  current entries 5290   
  searches73567507 1096.8/s
  inserts   6000688.9/s
  removals  5947788.9/s
Counters
  match 752600   11.2/s
  bad-offset 00.0/s
  fragment 1020.0/s
  short  00.0/s
  normalize  20.0/s
  memory680.0/s
  bad-timestamp  00.0/s
  congestion 00.0/s
  ip-option  00.0/s
  proto-cksum00.0/s
  state-mismatch 126550.2/s
  state-insert   00.0/s
  state-limit00.0/s
  src-limit  20.0/s
  synproxy

Today's traffic graph:
http://www.fabiankeil.de/blog-surrogat/2006/06/27/tor.fabiankeil.de-dritter-ausfall-24-stunden-durchsatz-statistik-595x337.png
(The hang around 14:00 happened while I was logged in doing a buildworld)

At the moment I'm building RELENG_6 with DDB to see if it changes anything
and if I can get a core dump, but so far the problem seems to be
similar to: http://www.freebsd.org/cgi/query-pr.cgi?pr=95180 (closed)
and http://freebsd.rambler.ru/bsdmail/freebsd-questions_2006/msg08692.html.

Is anyone on this

Re: GELI issues ? (Re: Increase in panics under 6.1)

2006-05-27 Thread Fabian Keil

Stanislaw Halik [EMAIL PROTECTED] wrote:

 On Thu, May 25, 2006, Fabian Keil wrote:
  Interestingly enough , i had some nasty issues todays on same
  laptop. I had  2 x 6 GB GELI vnodes, running mtree -K md5digest to
  compare contents. Disk IO was high as expected...but then it just
  died down (but the mtree hadnt finished). 
  (swap is also GELI)
 
  Any subsequent process trying to access the encrypted mount points
  simply stalled for as long as I cared to wait (10 minutes). The
  processes even stalled a shutdown -r. 
  I'm not sure if it's related, but I lately see this behaviour on
  NFS mounts if the server is not responding.
 
  Doing cd /mnt/mydeadnfsmount/[tab for autocompletion]
  is enough to render the current console unresponsive.
 
 Isn't that normal and desired for `hard' mounts?

Now that you mention it I guess your right.
It totally forgot that hard mounts are the default.

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: kmem leak in tmpmfs?

2006-05-26 Thread Fabian Keil

Iasen Kostov [EMAIL PROTECTED] wrote:

 On Thu, 2006-05-25 at 16:54 -0400, Kris Kennaway wrote:
  On Thu, May 25, 2006 at 06:01:30PM +0200, Arno J. Klaassen wrote:

   I get a very easy to reproduce panic on 6.1-STABLE :
   
   /etc/periodic/weekly/310.locate panics with
   
 panic: kmem_malloc(4096): kmem_map too small: 335544320 total
   allocated
  
  It looks like you are using a malloc-backed md and you do not have
  enough RAM to handle the size.  Perhaps tmpmfs does not use swap
  backing, as it is supposed to?

   First of all if there is not enough kmem (not just plain ram
 I think) kernel should not allow disk creation in first place, second
 - I think (although there could be some ... reason for that) it's
 stupid way to say I don't have more kmem by panicing :). Better way
 will be just to fail disk operation of that FS with Disk is full or
 something like that. At home I tried to raise kmem like that:
 vm.kmem_size_max=1073741824 (I got 2G of RAM)
 (setting vm.kmem_size directly panices kernel at boot if I remember
 correctly).
 
 but for my surprise kernel panices at exact same allocated md disk
 space with the same panic as the original poster's. Is it possible
 that I should rise KVA_PAGES too ? And I don't think its documented
 anywhere (of course I've tried googling and it's always possible that
 I've missed something :). All this was on FreeBSD 6.0.

man mdconfig mentions the problem:

 malloc   Storage for this type of memory disk is allocated with
  malloc(9).  This limits the size to the malloc bucket
  limit in the kernel.  If the -o reserve option is not
  set, creating and filling a large malloc-backed memory
  disk is a very easy way to panic a system.

Use a swap backed disk and the problem will disappear.

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: GELI issues ? (Re: Increase in panics under 6.1)

2006-05-25 Thread Fabian Keil

Norberto Meijome [EMAIL PROTECTED] wrote:

 On Tue, 23 May 2006 22:01:16 -0400
 Kris Kennaway [EMAIL PROTECTED] wrote:
 
  So what is the traceback?
  
  See the developers handbook for more information.
 
 doh! yes, i'll get  onto this as soon as I can.
 
 Interestingly enough , i had some nasty issues todays on same laptop.
 I had  2 x 6 GB GELI vnodes, running mtree -K md5digest to compare
 contents. Disk IO was high as expected...but then it just died down
 (but the mtree hadnt finished). 
 (swap is also GELI)
 
 Any subsequent process trying to access the encrypted mount points
 simply stalled for as long as I cared to wait (10 minutes). The
 processes even stalled a shutdown -r. 

I'm not sure if it's related, but I lately see this behaviour on
NFS mounts if the server is not responding.

Doing cd /mnt/mydeadnfsmount/[tab for autocompletion]
is enough to render the current console unresponsive.

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: kmem leak in tmpmfs?

2006-05-25 Thread Fabian Keil

Arno J. Klaassen [EMAIL PROTECTED] wrote:

 Hello,
 
 I get a very easy to reproduce panic on 6.1-STABLE :
 
 /etc/periodic/weekly/310.locate panics with
 
   panic: kmem_malloc(4096): kmem_map too small: 335544320 total
 allocated

 This box has nothing particular, apart from maybe a large number
 of stamp-file based test-databases (with a lot of zero-sized
 files named .key=value).
 Producing this bug is easy :
 
  - set tmpmfs=YES and set tmpsize greater than around 220m
  - start /etc/periodic/weekly/310.locate (and nothing else!)
  - wait two-three hours and bang
 
 Last test is with tmpfs=1024m and I monitored df -h /tmp and
 vmstat -zm every minute; when the system panics, last output is :
 
   FilesystemSizeUsed   Avail Capacity  Mounted on
   /dev/md0  989M219M691M24%/var/tmp
 
   vmstat -zm | fgrep md0
   md0: 512,0,  453257, 15,   453437
 
 I'm quite not an expert, but looks to me as if md0 use stays
 almost 100% in kmem and is never swapped (as it is supposed to do
 by default according to the man-page).

The rc script has different defaults than mdmfs:

[EMAIL PROTECTED] ~ $grep tmpmfs_flags /etc/defaults/rc.conf 
tmpmfs_flags=-S -M# Extra mdmfs options for the mfs /tmp

You probably want to ditch the -M.

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Loading geom_eli in loader.conf disables psm0

2006-05-23 Thread Fabian Keil

To encrypt my home slice with geli I followed
17.16.2 Disk Encryption with geli:
http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/disks-encrypting.html#AEN26326

As I prefer to have my home directory available after boot,
I additionally added:

geli_devices=ad0s1
geli_ad0s1_flags=-k /root/ad0s1.key

to rc.conf and rebooted.

geli worked, my mouse no longer did. psm0 got lost:

--- dmesg-geli-enabled-in-loader.conf.txt Mon May 22 18:17:23 2006
+++ dmesg-without-geli-enabled-in-loader.conf.txt Mon May 22 18:21:33 2006
[...]
@@ -76,7 +76,9 @@
 atkbd0: AT Keyboard irq 1 on atkbdc0
 kbd0 at atkbd0
 atkbd0: [GIANT-LOCKED]
-acpi_ibm0: IBM ThinkPad ACPI Extras irq 12 on acpi0
+psm0: PS/2 Mouse irq 12 on atkbdc0
+psm0: [GIANT-LOCKED]
+psm0: model Generic PS/2 mouse, device ID 0
 sio0: 16550A-compatible COM port port 0x3f8-0x3ff irq 4 flags 0x10
on acpi0 sio0: type 16550A
 ppc0: Standard parallel printer port port 0x3bc-0x3be irq 7 on acpi0
@@ -88,12 +90,13 @@
 sio1: type 16550A
 battery0: ACPI Control Method Battery on acpi0
 acpi_acad0: AC Adapter on acpi0
+acpi_ibm0: IBM ThinkPad ACPI Extras on acpi0
 pmtimer0 on isa0
[...]

After I removed 'geom_eli_load=YES' in loader.conf and
rebooted psm0 was back and my mouse started to work again.

I saw no geli regression either, I assume geom_eli.ko
is loaded on demand by geli's rc script.

[EMAIL PROTECTED] ~ $kldstat
Id Refs AddressSize Name
 1   25 0xc040 41309c   kernel
 21 0xc0814000 b880 unionfs.ko
 31 0xc082 5760 if_tap.ko
 41 0xc0826000 565c snd_ich.ko
 52 0xc082c000 258d4sound.ko
 61 0xc0852000 43f4 acpi_video.ko
 73 0xc0857000 62fdcacpi.ko
 81 0xc08ba000 21dacradeon.ko
 92 0xc08dc000 10d80drm.ko
101 0xc08ed000 4c88 acpi_ibm.ko
113 0xc08f2000 215ccwlan.ko
121 0xc0914000 2ea0 wlan_wep.ko
131 0xc0917000 eec8 if_iwiNG.ko
143 0xc0926000 2e60 firmware.ko
151 0xc0929000 300fciwi_bss.ko
161 0xc095a000 9500 cpufreq.ko
171 0xc35a2000 b000 geom_eli.ko
181 0xc35c1000 19000crypto.ko
191 0xc35ad000 a000 zlib.ko

My /boot/loader.conf:
loader_logo=beastie
loader_color=YES
autoboot_delay=1
hw.ata.atapi_dma=1
radeon_load=YES
acpi_video_load=YES
acpi_ibm_load=YES
wlan_load=YES
wlan_wep_load=YES
if_iwiNG_load=YES
iwi_bss_load=YES
cpufreq_load=YES
snd_ich_load=YES
if_tap_load=YES
unionfs_load=YES
#geom_eli_load=YES
hw.psm.synaptics_support=1

[EMAIL PROTECTED] ~ $uname -a
FreeBSD TP51.local 6.1-STABLE FreeBSD 6.1-STABLE #30: Mon May 22
15:52:13 CEST 2006 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/THINKPAD  i386

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Configuring FreeBSD 4.9 on a new system (was: FreeBSD Newbie...)

2006-05-03 Thread Fabian Keil

John Dworske [EMAIL PROTECTED] wrote:

Help me...yeah...OK...so here it goes...I am brand new to
FreeBSD...installed OS onto a box from
a set of floppies I got off the net...
 
Last login: Wed May  3 14:37:21 2006 from 10.10.20.20
Copyright (c) 1980, 1983, 1986, 1988, 1990, 1991, 1993, 1994
The Regents of the University of California.  All rights
reserved.
 
FreeBSD 4.9-RELEASE (GENERIC) #0: Mon Oct 27 17:51:09 GMT 2003
 
Wondering what I need to update my system to make sure it has
everything I need to do work...like want to setup a slave DNS server
and apache webserver for starters...

How about replacing it with FreeBSD 6.1 first?

Bind is part of the base system, and Apache part of the ports
collection. While this is true for FreeBSD 4.9 as well, you
shouldn't use such an old release unless you have a reason.

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: devfs.conf and pass0

2006-04-13 Thread Fabian Keil

JoaoBR [EMAIL PROTECTED] wrote:

 seems on recent releng_6 (RC1) the permissions set to pass0 
 within /etc/devfs.conf are not applied anymore and need to be set
 manual in order getting acd0 available to users 

Works for me on FreeBSD 6.1-RC #1: Sun Apr  9 20:07:42 CEST 2006.

Did you by any chance just forgot to add a newline after
your pass0 line?

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: devfs.conf and pass0

2006-04-13 Thread Fabian Keil

JoaoBR [EMAIL PROTECTED] wrote:

 On Thursday 13 April 2006 09:28, Fabian Keil wrote:
  JoaoBR [EMAIL PROTECTED] wrote:
   seems on recent releng_6 (RC1) the permissions set to pass0
   within /etc/devfs.conf are not applied anymore and need to be set
   manual in order getting acd0 available to users
 
  Works for me on FreeBSD 6.1-RC #1: Sun Apr  9 20:07:42 CEST 2006.
 
  Did you by any chance just forgot to add a newline after
  your pass0 line?

 nooo there are others below and the last is an empty line
 
 the permissions are set as before to acd0 and cd0 but not to pass0
 
 I cvsuped yesterday
 6.1-RC FreeBSD 6.1-RC #3: Wed Apr 12 18:15:55 BRT 2006
 
 seems there was a change in devfs.h yesterday
 or any other idea?

I cvsuped a few minutes ago and didn't see any devfs changes.

I'm now running FreeBSD 6.1-RC #0: Thu Apr 13 17:01:11 CEST 2006
and all rules in /etc/devfs.conf still apply. 

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: truss problems

2006-04-10 Thread Fabian Keil

Jonas Wolz [EMAIL PROTECTED] wrote:

 while trying to get the gnash CVS version to work I noticed that on
 my system (FreeBSD 6.0-RELEASE) truss obviously has problems tracing
 firefox: truss prints somewhat random error messages and traces
 only some of the system calls firefox makes (opening a local file
 doesn't show up, for example).
 
 The output looks like that (I can provide the truss log if somebody
 is interested):
 [EMAIL PROTECTED]:/tmp$ truss -f -o ff.log firefox
 truss: PIOCWAIT top of loop: Input/output error
 truss: get_struct 0x0: Bad address
 [EMAIL PROTECTED]:/tmp$ truss -f -o ff.log firefox
 truss: Cannot malloc 1081891232 bytes for pollfd array: Cannot
 allocate memory [EMAIL PROTECTED]:/tmp$ truss -f -o ff.log firefox
 truss: cannot open /proc/0/mem: No such file or directory
 truss: cannot open /proc/0/mem: No such file or directory
 truss: Cannot malloc 1162889024 bytes for pollfd array: Cannot
 allocate memory [EMAIL PROTECTED]:/tmp$ truss -f -o ff.log firefox
 truss: PIOCWAIT top of loop: Input/output error
 truss: PIOCCONT: Input/output error
 truss: Cannot malloc 1162889024 bytes for pollfd array: Cannot
 allocate memory [EMAIL PROTECTED]:/tmp$ truss -f -o ff.log firefox
 truss: PIOCWAIT top of loop: Input/output error
 truss: Cannot malloc 1162889024 bytes for pollfd array: Cannot
 allocate memory [EMAIL PROTECTED]:/tmp$ truss -f -o ff.log firefox
 truss: cannot open /proc/0/mem: No such file or directory
 truss: Cannot malloc 1162889024 bytes for pollfd array: Cannot
 allocate memory [EMAIL PROTECTED]:/tmp$ truss -f -o ff.log firefox
 truss: PIOCWAIT top of loop: Input/output error
 truss: PIOCCONT: Input/output error
 truss: cannot open /proc/0/mem: No such file or directory
 truss: Cannot malloc 1162889024 bytes for pollfd array: Cannot
 allocate memory [EMAIL PROTECTED]:/tmp$ truss -f -o ff.log firefox
 truss: PIOCWAIT top of loop: Input/output error
 truss: get_struct 0x0: Bad address
 [EMAIL PROTECTED]:/tmp$ 

 Can someone else also reproduce this problem/is this a known bug or
 is just something broken on my system?
 If you need more details please let me know.

I can't reproduce exactly the same problem on
FreeBSD TP51.local 6.1-RC FreeBSD 6.1-RC #1: Sun Apr  9 20:07:42 CEST 2006
but I get a different problem with truss and Firefox.

If I run truss -f firefox it seems to get stuck after a while.

 1274: 
mmap(0x0,36864,(0x3)PROT_READ|PROT_WRITE,(0x1002)MAP_ANON|MAP_PRIVATE,-1,0x0) = 
689876992 (0
 1274: kse_release(0x8064fa0)= 0 (0x0)
 1274: kse_release(0x8064fa0)= 0 (0x0)
 1274: kse_release(0x8064fac)= 0 (0x0)
 1274: kse_release(0x8064fa0)= 383 (0x17f)
 1274: kse_release(0x8064fa0)= 383 (0x17f)
 1274: kse_release(0x8064fa0)= 0 (0x0)
 1274: kse_release(0x8064fa0)= 383 (0x17f)
 1274: kse_release(0x8064fa0)= 383 (0x17f)
 1274: kse_release(0x8064fa0)= 0 (0x0)
^C 1259: wait4(0x,0xbfbfe9d8,0x2,0x0)ERR#4 'Interrupted system call'
 1266: wait4(0x,0xbfbfe728,0x2,0x0)  ERR#4 'Interrupted system call'
 1274: kse_release(0x8064fa0)= 383 (0x17f)

truss firefox seems to work.

If I attach truss to a running Firefox I get:

[EMAIL PROTECTED] ~ $truss -f -p 1440
 1440: (null)()  = 0 (0x0)
 1440: kse_release(0x8064fa0)= 0 (0x0)
 1440: kse_release(0x8064fa0)= 0 (0x0)
 1440: kse_release(0x8064fa0)= 0 (0x0)
 1440: kse_release(0x8064fa0)= 0 (0x0)
 1440: kse_release(0x8064fac)= 0 (0x0)
 1440: kse_release(0x8064fa0)= 0 (0x0)
 1440: kse_release(0x8064fa0)= 0 (0x0)
 1440: kse_release(0x8064fa0)= 0 (0x0)
truss: Cannot malloc -67210816 bytes for pollfd array: Cannot allocate memory

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: How can I install a driver?

2006-04-10 Thread Fabian Keil

Yousef Raffah [EMAIL PROTECTED] wrote:

 I'm having an issue as I'm a newbie in installing/configuring the
 marvell driver for FreeBSD.
 
 A quick search in the mailing lists shows:
 http://www.freebsd.org/cgi/getmsg.cgi?fetch=2601224+2604070
 +/usr/local/www/db/text/2006/freebsd-questions/20060402.freebsd-questions
 
 but I have no clue how I can bypass the second step, which is
 installing the if_myk.ko to /boot/kernel
 
 I have tried to cp if_yk.ko /boot/kernel/
 
 but that didn't bring anything new in ifconfing!

Try:
kldxref /boot/kernel
kldload if_yk

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: Prism wi support in 6.x - or alternative card

2006-04-09 Thread Fabian Keil

Brian Candler [EMAIL PROTECTED] wrote:

 I Hvae an IBM Thinkpad X30 with a miniPCI wireless card:
 
 wi0: Intersil Prism2.5 mem 0xf800-0xf8000fff irq 11 at device
 2.0 on pci1 wi0: using RF:PRISM2.5 MAC:ISL3874A(Mini-PCI)
 wi0: Intersil Firmware: Primary (1.1.0), Station (1.4.9)
 wi0: Ethernet address: 00:05:3c:09:7e:9d
 wi0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps
 
 I have found it to be flaky under FreeBSD 5.4. It's OK for occasional
 use but when under heavy load, e.g. 'unison' syncing to another
 machine, it locks up:
 
 Mar 27 21:10:00 thinkdog kernel: wi0: timeout in wi_cmd 0x010b; event
 status 0xa000 Mar 27 21:10:00 thinkdog kernel: wi0: xmit failed
 Mar 27 21:10:04 thinkdog kernel: wi0: timeout in wi_cmd 0x0021; event
 status 0xa000 Mar 27 21:10:09 thinkdog kernel: wi0: wi_cmd: busy bit
 won't clear.
 
 At this point the only solution is to unload and reload the if_wi
 module.
 
 So my questions are:
 
 1. Is support for this hardware significantly improved in 6.X?

I don't think so. I have a wi card which worked fine in 5.4, but shows
the symptoms above on 6.x.
 
 2. If I were to buy another miniPCI card to replace it, what's the
current recommendation?

Something which works with ath.

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: 6.0-REL problems with ISA ed0, FFS corruption and ancient hardware

2006-03-19 Thread Fabian Keil

Matt Emmerton [EMAIL PROTECTED] wrote:

 I recently upgraded a 4.11-REL machine to 6.0-REL and have run into
 some snags.  While the installation from CD went fine, after
 configuring and enabling my ed0 NIC, bad things start to happen.
 
 FWIW, this machine is an ancient (hardware circa 1991, BIOS circa
 1994) dual-Pentium 133 MHz machine, with EISA/PCI and onboard SCSI.

At least it got lots of memory, last week I installed FreeBSD
6.1-PRERELEASE on a P90 with 16MB RAM.

 So far I can reliably reproduce two panics, one appears to be a ed
 driver bug (based on reports of similar panics with different NICs,
 notably nge) and one is a filesystem corruption problem.
 
 Here's the process that I go through to reliably reproduce both
 problems. 1) Boot machine in multi-user mode
 2) After ifconfig ed0, machine panics with a trap 12 in ithread_loop.
 3) In debugger, reset (or panic to get vmcore)
 4) Reboot in multi-user mode, but set hint.ed.0.disabled=1 in the
 boot loader (to avoid ifconifg panic)
 5) Root filesystem is fsckd; all other filesystems are scheduled for
 background fsck
 6) Encounter panic ffs_valloc: dup alloc
 7) In debugger, reset (or panic to get vmcore)

Did you try to do a foreground fsck in single user mode?

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: wpa_supplicant with NDIS-wrapped wireless card and WPA-PSK reboots 6.1-pre

2006-03-01 Thread Fabian Keil

Carlos Amengual [EMAIL PROTECTED] wrote:

 My system is a 6.1-PRERELEASE as of yesterday afternoon, but the same 
 happened with a RELENG_6 as of a month ago.
 
 I set up a D-Link AirPlus DWL-520+ wireless PCI adapter in an old 
 server, and NDISwrapped it (got an AIRPLUS_SYS.ko).
 
 When running wpa_supplicant -dd -indis0 -Dndis 
 -c/etc/wpa_supplicant.conf, the system reboots after printing:
 
 
 Initializing interface 'ndis0' conf '/etc/wpa_supplicant.conf' driver
 'ndis' Configuration file '/etc/wpa_supplicant.conf' -
 '/etc/wpa_supplicant.conf' Reading configuration file
 '/etc/wpa_supplicant.conf' ctrl_interface='/var/run/wpa_supplicant'
 ctrl_interface_group=0 (from group name 'wheel')
 Line: 6 - start of a new network block
 ssid - hexdump_ascii(len=11):
  47 4e 43 57 49 52 45 4c 45 53 53  GNCWIRELESS
 scan_ssid=1 (0x1)
 key_mgmt: 0x2
 PSK (ASCII passphrase) - hexdump_ascii(len=26): [REMOVED]
 PSK (from passphrase) - hexdump(len=32): [REMOVED]
 Priority group 0
id=0 ssid='GNCWIRELESS'
 Initializing interface (2) 'ndis0'
 EAPOL: SUPP_PAE entering state DISCONNECTED
 EAPOL: KEY_RX entering state NO_KEY_RECEIVE
 EAPOL: SUPP_BE entering state INITIALIZE
 EAP: EAP entering state DISABLED
 EAPOL: External notification - portEnabled=0
 EAPOL: External notification - portValid=0
 NDIS: 1 adapter names found
 NDIS: 1 adapter descriptions found
 NDIS: 0 - ndis0 - ndis0
 NDIS: Adapter description prefix 'ndis0'
 ndis_get_oid: oid=0xd010122 len (512) failed
 NDIS: verifying driver WPA capability
 NDIS: WPA key management supported
 NDIS: WPA-PSK key management supported
 ndis_set_oid: oid=0xd01011b len (4) failed
 NDIS: Failed to set OID_802_11_ENCRYPTION_STATUS (6)
 NDIS: TKIP encryption supported
 NDIS: driver supports WPA
 NDIS: driver capabilities: key_mgmt 0x5 enc 0x4 auth 0x3
 Own MAC address: **:**:**:**:**:**
 wpa_driver_ndis_set_wpa: enabled=1
 ndis_get_oid: oid=0xd010101 len (6) failed

 My /etc/wpa_supplicant.conf:
 
 ctrl_interface=/var/run/wpa_supplicant
 ctrl_interface_group=wheel
 #
 # home network; allow all valid ciphers
 network={
 ssid=GNCWIRELESS
 scan_ssid=1
 key_mgmt=WPA-PSK
 psk=**
 }

Does it make a difference if you additionally put the
bssid in /etc/wpa_supplicant.conf?

Since I upgraded from RELENG_5 to RELENG_6
I have to use both the ssid and the bssid
to get ndis0 to associate.

I only use wep encryption and don't know if a failed
attempt to associate with wpa_supplicant can cause
a reboot, but it's worth a try.

You should also check if you can associate to
the (unencrypted) network with ifconfig by hand.

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: device atapicam - causes huge slowdown

2006-02-24 Thread Fabian Keil

Adam Retter [EMAIL PROTECTED] wrote:

 FreeBSD funkalicious.home.dom 6.1-PRERELEASE FreeBSD 6.1-PRERELEASE #8:
 Thu Feb 23 23:24:57 GMT 2006
 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/funkalicious  i386
 
 I have a fairly straight-forward kernel config (see below) I think, yet
 if I enable device atapicam, and buildkernel and installkernel and
 reboot, the system starts up fine until it get's to finding disks and
 then it goes incredibly slowly, takes about 5 minutes to get to
 harvesting interupts and so on and so on, I think it will eventually
 get to the login prompt, but I havent been tolerant to wait that long
 15 minutes.

 If I dont use device atapicam the system is perfect, but I could
 really do with enabling it, for CD/DVD writting purposes...

If you don't use device atapicam you can kldload atapicam.ko later.
You could try it to see if it makes a difference.

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: top doesn't show any Process in idle-Mode

2006-01-30 Thread Fabian Keil

Michael Schuh [EMAIL PROTECTED] wrote:

 i use top mostly in idle-mode.
 # top return i
 or
 # top -I
 
 Under releng_6 (stable p4) and the older versions,
 i think down to releng_5, doesn't show a running process.

By default top doesn't show system processes.

If you run top -I and no process is shown, it means one
of the system processes is running. Probably idle.

Try top -I -S.

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: Best release for IBM laptop R51

2006-01-22 Thread Fabian Keil

Graham North [EMAIL PROTECTED] wrote:

 I am planning to load FreeBSD as a dual boot on new IBM laptop.
 The model is an R51 which comes with:
 Radeon 7500 - video
 Intel Pro/1000 NT Mobile
 Intel Pro/Wireless 2200BG
 Integrated Audio
 Intel 82802 UltraATA
 Can anyone tell me whether the above hardware is all supported and 
 stable in FreeBSD.

I had RELENG_5 installed on my ThinkPad R51 UN0K6GE until two weeks
ago when I switched to RELENG_6.
 
It was stable with 5.4 and is stable now. I just updated
to get some of the new features.

 Should I therefore download 6.0-Release and then just cvsup and
 rebuild?? Is this a better option than using 5.4 at this point?

I'd skip 5.4. It was good, but 6.0 is even better.

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

[Fixed] Re: ndis0 does not associate since update to RELENG_6

2006-01-18 Thread Fabian Keil

[EMAIL PROTECTED] (Bill Paul) wrote:

  Is there a way I can provide more information?
 
 You haven't said yet what manufacturer/model your access point is.

It's a Netgear WGT624 (Hardwareversion V3H1/Firmwareversion
V1.1.125_1.1.1GR).

I tried to associate ndis0 with wi0 in hostap mode and got the same
results.

 You also haven't said what Windows driver version you're using, but
 you need to cheat a bit to figure that out. I usually do:
 
 % strings -e l foo.sys

 Near the end of the output, there should be a bunch of version
 information, including the vendor name of whoever built the driver
 (in this case Intel). You might try downloading the latest driver
 from Intel. (They have a generic one for their Centrino wireless
 devices.)

The old driver which was shipped with the Laptop:

StringFileInfo
040904B0
Comments
NDIS 5 Miniport Driver for Win2000
CompanyName
Intel
 Corporation
FileDescription
Intel
 Wireless LAN Driver
FileVersion
8010-28 Driver
InternalName
w22n50.SYS
LegalCopyright
Copyright 
 Intel
 Corporation 2004
OriginalFilename
w22n50.SYS
ProductName
Intel
 Wireless LAN Adapter
VarFileInfo
Translation

The new one I downloaded from Intel today:

StringFileInfo
040904B0
Comments
NDIS 5.1 Miniport Driver
CompanyName
Intel
 Corporation
FileDescription
Intel
 Wireless LAN Driver
FileVersion
9003-9 Driver
InternalName
w29n51.SYS
LegalCopyright
Copyright 
 Intel
 Corporation 2004
OriginalFilename
w29n51.SYS
ProductName
Intel
 Wireless LAN Adapter
VarFileInfo
Translation
 
 You also haven't said what sort of laptop this is. Wouldn't hurt to
 know that either.

IBM ThinkPad R51 UN0K6GE.

 Unfortunately, this is the sort of thing that can only be debugged
 with the system sitting in front of me. I can't do it by remote
 control, and I can't know exactly what information to ask you. I have
 to experiment, and I can't do that from here.
 
 You should turn WEP off completely, make sure the AP is set for open
 authentication mode, and try getting it to authenticate without WEP
 first. It's one less variable to worry about. Try using the following:
 
 # ifconfig ndis0 ssid  up
 
 # ifconfig ndis0 ssid yourssid bssid BSSID of your AP up

Specifying the bssid is the solution.

ifconfig ndis0 ssid ec60bfg3b4 bssid BSSID wepkey 1:0xWEPKEY\
deftxkey 1 wepmode on up

Works with the new and the old driver and with both APs.

[EMAIL PROTECTED] ~ $ifconfig ndis0
ndis0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
inet6 fe ... 7500%ndis0 prefixlen 64 scopeid 0x3 
inet 192.168.0.32 netmask 0xff00 broadcast 192.168.0.255
ether 00 ... 00
media: IEEE 802.11 Wireless Ethernet autoselect (OFDM/54Mbps)
status: associated
ssid ec60bfg3b4 channel 11 bssid 00:...
authmode OPEN privacy ON deftxkey 1 wepkey 1:104-bit txpowmax
100 protmode CTS

Thanks for your time Bill.

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

ndis0 does not associate since update to RELENG_6

2006-01-17 Thread Fabian Keil

I fail to get the following device working since my update
from RELENG_5 to RELENG_6 a few days ago:

[EMAIL PROTECTED]:2:0: class=0x028000 card=0x27128086 chip=0x42208086 rev=0x05
hdr=0x00 vendor   = 'Intel Corporation'
device   = 'PRO/Wireless 2200BG Network Connection'
class= network

It worked fine with 5.4 and was recognised as

ndis0: Intel(R) PRO/Wireless 2200BG Network Connection mem
0xc0214000-0xc0214fff irq 11 at device 2.0 on pci
ndis0: NDIS API version: 5.0
ndis0: Ethernet address: 00: ... :00
ndis0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 5.5Mbps 11Mbps
ndis0: 11g rates: 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps 54Mbps

On 6.0 it is still recognised, but the last two lines are missing.

I'm using GENERIC, the driver module was generated with ndisgen
out of w22n51.inf and w22n50.sys. This is the combination I alway used. 

ndis0 can scan for access points, but can't associate with or without
WEP encryption.

[EMAIL PROTECTED] ~ $ifconfig ndis0 list scan
SSIDBSSID  CHAN RATE  S:N   INT CAPS
ec60bfg3b4  00: ... :a8   11   54M 149:0   100
EP   ??? ??? ??? ??? ??? ??? WME

[EMAIL PROTECTED] ~ $ifconfig ndis0
ndis0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
inet 192.168.0.32 netmask 0xff00 broadcast 192.168.0.255
inet6 fe80: ... :7500%ndis0 prefixlen 64 scopeid 0x5 
ether 00: ... :00
media: IEEE 802.11 Wireless Ethernet autoselect
status: no carrier
ssid  channel 1
authmode OPEN privacy ON deftxkey 1 wepkey 1:104-bit txpowmax
100 protmode CTS

I found a similar problem which should be fixed in current,
but I don't know if the changes already hit stable.  
http://freebsd.rambler.ru/bsdmail/freebsd-current_2005/msg11802.html

My problem is not exactly the same though, I have no trouble setting
the bssid. 

Additionally I can't set the mode to 11g:

[EMAIL PROTECTED] ~ #ifconfig ndis0 mode 11g
ifconfig: SIOCSIFMEDIA (media): Invalid argument

mode 11b is accepted but only leads to (DS/1Mbps).

I can associate to the access point with ath0 and wi0 (at least for a
short time). 

Is anybody else using this device with FreeBSD 6.0?

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: ndis0 does not associate since update to RELENG_6

2006-01-17 Thread Fabian Keil

Fabian Keil [EMAIL PROTECTED] wrote:

 I fail to get the following device working since my update
 from RELENG_5 to RELENG_6 a few days ago:
 
 [EMAIL PROTECTED]:2:0: class=0x028000 card=0x27128086 chip=0x42208086
 rev=0x05 hdr=0x00 vendor   = 'Intel Corporation'
 device   = 'PRO/Wireless 2200BG Network Connection'
 class= network

I found a work around. It still works with /usr/ports/net/iwi-firmware/:

[EMAIL PROTECTED] ~ $ifconfig iwi0
iwi0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
inet 192.168.0.32 netmask 0xff00 broadcast 192.168.0.255
inet6 fe ... :7500%iwi0 prefixlen 64 scopeid 0x5 
ether 00: ... :00
media: IEEE 802.11 Wireless Ethernet autoselect mode 11g
(OFDM/48Mbps) status: associated
ssid ec60bfg3b4 channel 11 bssid 00:...:a8
authmode OPEN privacy ON deftxkey 1 wepkey 1:104-bit txpowmax
100 protmode CTS bintval 100

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: ndis0 does not associate since update to RELENG_6

2006-01-17 Thread Fabian Keil

Parv [EMAIL PROTECTED] wrote:

 in message [EMAIL PROTECTED],
 wrote Fabian Keil thusly...
 
  Fabian Keil [EMAIL PROTECTED] wrote:
  
   I fail to get the following device working since my update
   from RELENG_5 to RELENG_6 a few days ago:
   
   [EMAIL PROTECTED]:2:0: class=0x028000 card=0x27128086 chip=0x42208086
   rev=0x05 hdr=0x00 vendor   = 'Intel Corporation'
   device   = 'PRO/Wireless 2200BG Network Connection'
   class= network
  
  I found a work around. It still works
  with /usr/ports/net/iwi-firmware/:
 
 I also found the same about ndis driver.  I was not even able to
 assign a ssid, mode, or a channel to a ndis0 interface.
 
 At least net/iwi-firmware works w/ WPA (even if the interface causes
 freeze after waking up from long sleep on IBM Thinkpad T42;

Did you try to load and unload if_iwi.ko in /etc/rc.resume and
/etc/rc.suspend?
 
 BTW, when i read that you found a work around, i was expecting a
 work around to make ndis work.

I'm sorry for my misleading wording then. Of course it's just a
work around to get the PRO/Wireless 2200BG working at all.

It's just that I had forgotten about the existence of iwi.
The last days I was using em0 to connect my Laptop to the
network. Getting if_iwi to work after my initial posting
was a relief.

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: ndis0 does not associate since update to RELENG_6

2006-01-17 Thread Fabian Keil

Parv [EMAIL PROTECTED] wrote:

 in message [EMAIL PROTECTED],
 wrote Fabian Keil thusly...
 
  Fabian Keil [EMAIL PROTECTED] wrote:
  
   I fail to get the following device working since my update
   from RELENG_5 to RELENG_6 a few days ago:
   
   [EMAIL PROTECTED]:2:0: class=0x028000 card=0x27128086 chip=0x42208086
   rev=0x05 hdr=0x00 vendor   = 'Intel Corporation'
   device   = 'PRO/Wireless 2200BG Network Connection'
   class= network

 I also found the same about ndis driver.  I was not even able to
 assign a ssid, mode, or a channel to a ndis0 interface.

I forgot to confirm that I can't assign ssid and channel as well. 

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: ndis0 does not associate since update to RELENG_6

2006-01-17 Thread Fabian Keil

[EMAIL PROTECTED] (Bill Paul) wrote:

  I fail to get the following device working since my update
  from RELENG_5 to RELENG_6 a few days ago:
  
  [EMAIL PROTECTED]:2:0: class=0x028000 card=0x27128086 chip=0x42208086
  rev=0x05 hdr=0x00 vendor   = 'Intel Corporation'
  device   = 'PRO/Wireless 2200BG Network Connection'
  class= network
  
  It worked fine with 5.4 and was recognised as
  
  ndis0: Intel(R) PRO/Wireless 2200BG Network Connection mem
  0xc0214000-0xc0214fff irq 11 at device 2.0 on pci
  ndis0: NDIS API version: 5.0
  ndis0: Ethernet address: 00: ... :00
  ndis0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 5.5Mbps 11Mbps
  ndis0: 11g rates: 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36Mbps 48Mbps
  54Mbps
  
  On 6.0 it is still recognised, but the last two lines are missing.
 
 That's normal.
  
  I'm using GENERIC, the driver module was generated with ndisgen
  out of w22n51.inf and w22n50.sys. This is the combination I alway
  used. 
  
  ndis0 can scan for access points, but can't associate with or
  without WEP encryption.
 
 What command do you type to try to get it to associate?

kldload wlan_wep.ko
kldload w22n50_sys.ko

ifconfig ndis0 ssid ec60bfg3b4 wepkey 1:0xhexkey \
deftxkey 1 wepmode on
ifconfig ndis0 inet 192.168.0.32 up
 
  [EMAIL PROTECTED] ~ $ifconfig ndis0 list scan
  SSIDBSSID  CHAN RATE  S:N   INT CAPS
  ec60bfg3b4  00: ... :a8   11   54M 149:0   100
  EP   ??? ??? ??? ??? ??? ??? WME
  
  [EMAIL PROTECTED] ~ $ifconfig ndis0
  ndis0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
  inet 192.168.0.32 netmask 0xff00 broadcast 192.168.0.255
  inet6 fe80: ... :7500%ndis0 prefixlen 64 scopeid 0x5 
  ether 00: ... :00
  media: IEEE 802.11 Wireless Ethernet autoselect
  status: no carrier
  ssid  channel 1
  authmode OPEN privacy ON deftxkey 1 wepkey 1:104-bit
  txpowmax 100 protmode CTS
  
  I found a similar problem which should be fixed in current,
  but I don't know if the changes already hit stable.  
  http://freebsd.rambler.ru/bsdmail/freebsd-current_2005/msg11802.html
  
  My problem is not exactly the same though, I have no trouble setting
  the bssid. 
 
 You should be able to do:
 
 # ifconfig ndis0 ssid ec60bfg3b4 wepmode on wepkey 0123456789123 up

It was my experience that ifconfig on 6.0 will not chose the first
key by default. I always have to add deftxkey 1.

I can't use your exact command because I know my wepkey only in
hexadecimal.

But if I disable WEP in the access point and use
ifconfig ndis0 ssid ec60bfg3b4 up
it fails to associate (or even to set the ssid) as well:

[EMAIL PROTECTED] ~ #ifconfig ndis0   
ndis0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
inet6 fe ... 7500%ndis0 prefixlen 64 scopeid 0x3 
ether 00: ... :00
media: IEEE 802.11 Wireless Ethernet autoselect
status: no carrier
ssid  channel 1
authmode OPEN privacy OFF txpowmax 100 protmode CTS
 
 You don't state what command you actually use. You should have
 specified it in your e-mail. Note that usually the WEP key has to be
 either 5 or 13 characters.

You're right, sorry.

I use the hexadecimal notation and my key is correctly recognised as
104-bit.
 
ifconfig ndis0 ssid ec60bfg3b4 wepkey 1:0xhexkey \
deftxkey 1 wepmode on
ifconfig ndis0 inet 192.168.0.32 up 

The two commands above work for iwi0, wi0 and ath0. 

I use the same shell script I used on 5.4. The only
change I made was adding deftxkey 1 which wasn't
needed before. 

  Is anybody else using this device with FreeBSD 6.0?
 
 I've tested the 2200BG myself with the NDISulator 6.0 and I've been
 able to get it to associate with 11g networks. I don't know what's
 wrong in your case.

Is there a way I can provide more information?

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: wi0 unreliable on FreeBSD 6.0

2006-01-16 Thread Fabian Keil

Kevin Oberman [EMAIL PROTECTED] wrote:

  Date: Sun, 15 Jan 2006 22:32:08 +0100
  From: Fabian Keil [EMAIL PROTECTED]

  Since the update from RELENG_5 to RELENG_6 a few days ago I have
  trouble with the wireless network.

  This card worked fine with FreeBSD 5.4:  

  wi0: T-Sinus 130card at port 0x4000-0x403f irq 11 function 0
  config 1 on pccard0
  wi0: using RF:PRISM2.5 MAC:ISL3873
  wi0: Intersil Firmware: Primary (1.0.4), Station (1.2.0)

 I notice that your firmware is pretty old. I am running Primary
 (1.1.1), Station (1.7.4) and don't seem to be having any serious
 problems. I'd suggest updating and see if that fixes things.

Thanks for the tip. Today I failed to get the right firmware files,
but I'll try again tomorrow.

Fabian
-- 
http://www.fabiankeil.de/

signature.asc
Description: PGP signature

wi0 unreliable on FreeBSD 6.0

2006-01-15 Thread Fabian Keil

Since the update from RELENG_5 to RELENG_6 a few days ago I have
trouble with the wireless network.

This card worked fine with FreeBSD 5.4:  

wi0: T-Sinus 130card at port 0x4000-0x403f irq 11 function 0 config 1
on pccard0
wi0: using RF:PRISM2.5 MAC:ISL3873
wi0: Intersil Firmware: Primary (1.0.4), Station (1.2.0)

But only works with very low traffic on FreeBSD 6.0.
I can use it to check my emails and to flood ping for a while:

--- 192.168.0.1 ping statistics ---
58577 packets transmitted, 57031 packets received, 2% packet loss
round-trip min/avg/max/stddev = 2.934/700.012/1169.922/327.406 ms

But as soon as I open firefox, which then tries to get some RSS feeds,
I loose the connection. If I have firefox already open I can sometimes
get the first half of a small web page, but only sometimes.

The ifconfig wi0 output is then shortened to:

wi0: flags=8807UP,BROADCAST,DEBUG,SIMPLEX,MULTICAST mtu 1500
inet 192.168.0.51 netmask 0xff00 broadcast 192.168.0.255
inet6 fe80::230:f1ff:fe66:d97e%wi0 prefixlen 64 scopeid 0x3 
ether 00:30:f1:66:d9:7e

instead of

wi0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST mtu 1500
inet 192.168.0.51 netmask 0xff00 broadcast 192.168.0.255
inet6 fe80::230:f1ff:fe66:d97e%wi0 prefixlen 64 scopeid 0x3 
ether 00:30:f1:66:d9:7e
media: IEEE 802.11 Wireless Ethernet autoselect (DS/2Mbps)
status: associated
ssid ec60bfg3b4 channel 11 bssid 00:14:6c:1b:62:a8
stationname FreeBSD WaveLAN/IEEE node
authmode OPEN privacy MIXED deftxkey 1 wepkey 1:104-bit
txpowmax 100

After ifconfig wi0 debug dmesg says:

wi0: timeout in wi_cmd 0x0002; event status 0x8008
wi0: timeout in wi_cmd 0x; event status 0x8008
wi0: wi_cmd: busy bit won't clear.
wi0: init failed
wi0: failed to allocate 2372 bytes on NIC
wi0: tx buffer allocation failed (error 12)
wi0: interface not running
wi0: link state changed to DOWN

If I unload if_wi and wlan_wep, remove the card, put it in again
and reload if_wi and wlan_wep, I can reconfigure the card and
ping some more.

I use wlan and wlan_wep as modules, my setup works fine
with an Atheros-based card.

I noticed the mails that wi0 is regarded as old technology and
therefore will not be enhanced to support WPA in the next time,
but it should still work as reliable as on 5.4, right?

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: NFS UDP mounts on RELENG_6?

2005-12-18 Thread Fabian Keil

Oliver Brandmueller [EMAIL PROTECTED] wrote:

 On Fri, Dec 16, 2005 at 04:30:31PM +0100, Fabian Keil wrote:
  Oliver Brandmueller [EMAIL PROTECTED] wrote:
  
   I'm experiencing problems when trying to mount NFS filesystems
   from a RELENG_6 server (FreeBSD hudson 6.0-STABLE FreeBSD
   6.0-STABLE #0: Wed Dec 14 16:59:55 CET 2005 
  [EMAIL PROTECTED]:/usr/obj/usr/src/sys/NFS-32-FBSD6  i386)
   to either 5.4-STABLE or 6-STABLE clients. mounting works fine,
   but afterwards the access to the filesystem on the client stalls.
   As soon as I mount the FS with a TCP mount everything works as
   expected.
   
   The mounts worked fine on UDP when the server was 5.4-STABLE.
   There is just a plain GigE switch involved, no firewalls or
   routing.
   
   Anyone else experiencing those problems or having an idea?
  
  I just copied some files (200 MB) from a NFS Server running
  
  FreeBSD africanqueen.local 6.0-STABLE FreeBSD 6.0-STABLE
  #5: Thu Dec 15 19:31:12 CET 2005
  [EMAIL PROTECTED]:/usr/obj/usr/src/sys/AFRICANQUEEN i386
  
  without problems. My client runs FreeBSD 5.4, I use GigE as well,
  but no switch.
 
 Which kind GigE Interface do you use?

Client:
[EMAIL PROTECTED] ~ $pciconf -lv| grep em0 -A 2
[EMAIL PROTECTED]:1:0:   class=0x02 card=0x05491014 chip=0x101e8086 rev=0x03
hdr=0x00 vendor   = 'Intel Corporation'
device   = '82540EP Gigabit Ethernet Controller (Mobile)'

Server:
[EMAIL PROTECTED] ~ $pciconf -lv| grep re[01] -A 2
[EMAIL PROTECTED]:9:0:   class=0x02 card=0x816910ec chip=0x816910ec rev=0x10
hdr=0x00 vendor   = 'Realtek Semiconductor'
device   = 'RTL8169 Gigabit Ethernet Adapter'
--
[EMAIL PROTECTED]:10:0:  class=0x02 card=0x601b182d chip=0x816910ec rev=0x10
hdr=0x00 vendor   = 'Realtek Semiconductor'
device   = 'RTL8169 Gigabit Ethernet Adapter'

re0 is made by Vivanco, re1 is a Sitecom card.

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: NFS UDP mounts on RELENG_6?

2005-12-16 Thread Fabian Keil

Oliver Brandmueller [EMAIL PROTECTED] wrote:

 I'm experiencing problems when trying to mount NFS filesystems from a
 RELENG_6 server (FreeBSD hudson 6.0-STABLE FreeBSD 6.0-STABLE
#0: Wed Dec 14 16:59:55 CET 2005 
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/NFS-32-FBSD6  i386)
 to either 5.4-STABLE or 6-STABLE clients. mounting works fine, but 
 afterwards the access to the filesystem on the client stalls. As soon
 as I mount the FS with a TCP mount everything works as expected.
 
 The mounts worked fine on UDP when the server was 5.4-STABLE. There
 is just a plain GigE switch involved, no firewalls or routing.
 
 Anyone else experiencing those problems or having an idea?

I just copied some files (200 MB) from a NFS Server running

FreeBSD africanqueen.local 6.0-STABLE FreeBSD 6.0-STABLE
#5: Thu Dec 15 19:31:12 CET 2005
[EMAIL PROTECTED]:/usr/obj/usr/src/sys/AFRICANQUEEN i386

without problems. My client runs FreeBSD 5.4, I use GigE as well,
but no switch.

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: FreeBSD 6.0 panic: kmem_malloc(16384): kmem_map too small: 172728320 total allocated [solved]

2005-12-15 Thread Fabian Keil

Kris Kennaway [EMAIL PROTECTED] wrote:

 On Wed, Dec 14, 2005 at 05:32:34PM +0100, Fabian Keil wrote:
 
  I guess you're right. I can fill a 256MB swap-backed disk without
  panic and without swapping.
 
 FYI, this is documented in the manpage.

I think the panic potential should be mentioned in md(4) as well.

I used a script not written by me, the commands used were working
and after the panic I only read man md. Of course mdconfig(8) is
mentioned twice, but I didn't think I needed more information
about it.  

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Re: Slightly OT, getting errors from members on this list

2005-12-15 Thread Fabian Keil

Morten A. Middelthon [EMAIL PROTECTED] wrote:

 I just got this message after posting to freebsd-stable@freebsd.org:

 Subject: Blogger post failed
 From: [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Date: Thu, 15 Dec 2005 05:32:36 -0800 (PST)

 Blogger does not accept multipart/signed files.

 Error code: 7.774C07

You are not alone. 
http://freebsd.rambler.ru/bsdmail/freebsd-stable_2005/msg08530.html

 Quite annoying.

And to be allowed to complain, you need a blogger account:
http://www.blogger.com/problem.g

Do no evil my ass.

Fabian
-- 
http://www.fabiankeil.de/

signature.asc
Description: PGP signature

FreeBSD 6.0 panic: kmem_malloc(16384): kmem_map too small: 172728320 total allocated

2005-12-14 Thread Fabian Keil

I triggered a few reproducible panics on FreeBSD 6.0-STABLE.

I created a ramdisk with:
 
/sbin/mdconfig -a -t malloc -s 256M -u 10
/sbin/newfs -U /dev/md10
/sbin/mount /dev/md10 /mnt/ramdisk

The system has avail memory = 515932160 (492 MB)
and 1GB swap space.

While copying to /mnt/ramdisk trough ftp localhost
it got:

[EMAIL PROTECTED] ~/crashdump #kgdb kernel-GENERIC.debug vmcore.3
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: 
Undefined symbol ps_pglobal_lookup]
GNU gdb 6.1.1 [FreeBSD]
[...]
Unread portion of the kernel message buffer:
panic: kmem_malloc(16384): kmem_map too small: 172728320 total allocated
Uptime: 2m57s
Dumping 511 MB (2 chunks)
  chunk 0: 1MB (158 pages) ... ok
  chunk 1: 511MB (130800 pages) 495 479 463 447 431 415 399 383 367 351 335 319 
303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15

#0  doadump () at pcpu.h:165
165 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) where
#0  doadump () at pcpu.h:165
#1  0xc063a4ee in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xc063a784 in panic (fmt=0xc0880846 kmem_malloc(%ld): kmem_map too small: 
%ld total allocated)
at /usr/src/sys/kern/kern_shutdown.c:555
#3  0xc07a44bd in kmem_malloc (map=0xc10430c0, size=16384, flags=1026) at 
/usr/src/sys/vm/vm_kern.c:299
#4  0xc079c0c6 in page_alloc (zone=0x0, bytes=16384, pflag=0x0, wait=1026) at 
/usr/src/sys/vm/uma_core.c:958
#5  0xc079e41f in uma_large_malloc (size=16384, wait=1026) at 
/usr/src/sys/vm/uma_core.c:2702
#6  0xc0630085 in malloc (size=16384, mtp=0xc08ffe40, flags=1026) at 
/usr/src/sys/kern/kern_malloc.c:329
#7  0xc078365e in softdep_disk_io_initiation (bp=0xcd899658) at 
/usr/src/sys/ufs/ffs/ffs_softdep.c:3630
#8  0xc078b1fe in ffs_geom_strategy (bo=0xc3593e90, bp=0xcd899658) at buf.h:422
#9  0xc0796869 in ufs_strategy (ap=0x0) at /usr/src/sys/ufs/ufs/ufs_vnops.c:1926
#10 0xc081c645 in VOP_STRATEGY_APV (vop=0xc09012a0, a=0xdd93ec0c) at 
vnode_if.c:1796
#11 0xc06841d0 in bufstrategy (bo=0xc35f7720, bp=0x0) at vnode_if.h:928
#12 0xc067eda8 in bufwrite (bp=0xcd899658) at buf.h:415
#13 0xc067f397 in bawrite (bp=0x0) at buf.h:399
#14 0xc078b53d in ffs_syncvnode (vp=0xc35f7660, waitfor=1) at 
/usr/src/sys/ufs/ffs/ffs_vnops.c:256
#15 0xc078b28e in ffs_fsync (ap=0xdd93ecc0) at 
/usr/src/sys/ufs/ffs/ffs_vnops.c:179
#16 0xc081c05c in VOP_FSYNC_APV (vop=0x0, a=0x0) at vnode_if.c:1020
#17 0xc0698278 in fsync (td=0xc3460d80, uap=0x0) at vnode_if.h:537
#18 0xc080b6eb in syscall (frame=
  {tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 64, tf_esi = 134572032, 
tf_ebp = -1077940680, tf_isp = -5775079 
96, tf_ebx = 134561920, tf_edx = 1, tf_ecx = 6, tf_eax = 95, tf_trapno = 0, 
tf_err = 2, tf_eip = 672366947, tf_cs = 
 51, tf_eflags = 662, tf_esp = -1077945572, tf_ss = 59}) at 
/usr/src/sys/i386/i386/trap.c:981
#19 0xc07fa57f in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:200
#20 0x0033 in ?? ()
Previous frame inner to this frame (corrupt stack?)


By simply copying to /mnt/ramdisk with cp I got:

[EMAIL PROTECTED] ~/crashdump #kgdb kernel-GENERIC.debug vmcore.4
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: 
Undefined symbol ps_pglobal_lookup]
GNU gdb 6.1.1 [FreeBSD]
[...]
Unread portion of the kernel message buffer:
g_vfs_done():md10[WRITE(offset=206372864, length=131072)]error = 28
g_vfs_done():md10[WRITE(offset=206503936, length=131072)]error = 28
g_vfs_done():md10[WRITE(offset=206635008, length=131072)]error = 28
g_vfs_done():md10[WRITE(offset=206766080, length=131072)]error = 28
g_vfs_done():md10[WRITE(offset=206897152, length=131072)]error = 28
g_vfs_done():md10[WRITE(offset=207028224, length=131072)]error = 28
g_vfs_done():md10[WRITE(offset=207159296, length=131072)]error = 28
g_vfs_done():md10[WRITE(offset=207290368, length=131072)]error = 28
g_vfs_done():md10[WRITE(offset=207421440, length=131072)]error = 28
g_vfs_done():md10[WRITE(offset=207552512, length=131072)]error = 28
g_vfs_done():md10[WRITE(offset=207683584, length=131072)]error = 28
g_vfs_done():md10[WRITE(offset=207814656, length=131072)]error = 28
g_vfs_done():md10[WRITE(offset=207945728, length=131072)]error = 28
g_vfs_done():md10[WRITE(offset=208076800, length=131072)]error = 28
g_vfs_done():md10[WRITE(offset=208207872, length=131072)]error = 28
g_vfs_done():md10[WRITE(offset=208338944, length=131072)]error = 28
panic: kmem_malloc(4096): kmem_map too small: 172728320 total allocated
Uptime: 11m23s
Dumping 511 MB (2 chunks)
  chunk 0: 1MB (158 pages) ... ok
  chunk 1: 511MB (130800 pages) 495 479 463 447 431 415 399 383 367 351 335 319 
303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15

#0  doadump () at pcpu.h:165
165 pcpu.h: No such file or directory.
in pcpu.h
#0  doadump () at pcpu.h:165
#1  0xc063a4ee in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:399
#2  0xc063a784 in panic (fmt=0xc0880846 kmem_malloc(%ld): kmem_map too

Re: FreeBSD 6.0 panic: kmem_malloc(16384): kmem_map too small: 172728320 total allocated

2005-12-14 Thread Fabian Keil

Gleb Smirnoff [EMAIL PROTECTED] wrote:

 On Wed, Dec 14, 2005 at 01:25:30PM +0100, Fabian Keil wrote:
 F I triggered a few reproducible panics on FreeBSD 6.0-STABLE.
 F 
 F I created a ramdisk with:
 F  
 F /sbin/mdconfig -a -t malloc -s 256M -u 10
 F /sbin/newfs -U /dev/md10
 F /sbin/mount /dev/md10 /mnt/ramdisk
 F 
 F The system has avail memory = 515932160 (492 MB)
 F and 1GB swap space.
 F 
 F While copying to /mnt/ramdisk trough ftp localhost
 F it got:
 
 This usually exposes some memory leak in kernel. Can you please do the
 following - copy some amount of data to /mnt/ramdisk trough ftp
 localhost, and cancel the operation before it panics.
 
 Then run vmstat -m and vmstat -z, to determine what kind of memory
 allocation is leaking.

I had loops with vmstat -m and vmstat -z in the background while
copying to /mnt/ramdisk. The last output before the panic was:

 Type InUse MemUse HighUse Requests  Size(s)
DEVFS22 1K   -   23  16,128
pfs_nodes20 3K   -   20  128
 GEOM   18926K   -  858
16,32,64,128,256,512,1024,2048,4096
   isadev17 2K   -   17  64
  ATA DMA 4 1K   -4  128
 cdev27 4K   -   27  128
AR driver 0 0K   -   11  512,2048
   ACD driver 3 6K   -3  2048
file desc   12046K   - 1611  16,32,256,512,2048
sigio 2 1K   -3  32
 kenv96 7K   -   97  16,32,64,4096
   kqueue 0 0K   -   62  256,1024
proc-args43 2K   -  797  16,32,64,128
   zombie 0 0K   -  907  128
  ithread48 5K   -   49  64,128
   KTRACE   10013K   -  100  128
  CAM SIM 1 1K   -1  64
   linker68 3K   -   99  16,32,256
  CAM XPT10 1K   -   17  16,64,512
lockf 3 1K   -3  64
   devbuf  1346  3177K   - 1816
16,32,64,128,256,512,1024,2048,4096
temp16   171K   - 6266
16,32,64,128,256,512,1024,2048,4096
   ip6opt 1 1K   -  1  128
   ip6ndp 6 1K   -7  64,128
   module   37124K   -  371  64,128
 mtx_pool 1 8K   -1  
 pgrp36 3K   -  623  64
  session29 4K   -   47  128
 proc 2 4K   -2  2048
  subproc   209   413K   - 1116  256,4096
 cred35 5K   - 4132  128
   plimit18 5K   -  400  256
  uidinfo 4 1K   -   20  32,512
   sysctl 0 0K   -  619  16,32,64
sysctloid  256777K   - 2567  16,32,64
sysctltmp 0 0K   -  280  16,32,128
 umtx   120 8K   -  120  64
 SWAP 2   141K   -2  64
  bus   95938K   - 3599  16,32,64,128,1024
   bus-sc5727K   - 1537
16,32,64,128,256,512,1024,2048,4096
  devstat1837K   -   18  16,4096
 eventhandler37 3K   -   37  32,128
 kobj   248   496K   -  299  2048
  MD disk   294 7K   -  294  16,2048
   MD sectors   293  1172K   -  293  4096
 rman   14910K   -  570  16,64
 sbuf 0 0K   -  440
16,32,64,128,256,512,1024,2048,4096 sleep queues   121 4K
-  121  32
taskqueue 6 1K   -6  128
   turnstiles   121 8K   -  121  64
   Unitno 7 1K   -9  16,64
 ioctlops 0 0K   - 2757  16,32,64,256,512,1024,4096
  iov 0 0K   -  487  16,64,128
  msg 425K   -4  1024,4096
  sem 4 7K   -4  512,1024,4096
  shm 112K   -1  
 ttys  1228   174K   - 3223  128,1024
 ptys 3 1K   -3  128
 mbuf_tag 0 0K   -6  32,64
   soname 6 1K   -  735  16,32,128
  pcb29 5K   -   81  16,32,64,2048
   BIO buffer 0 0K   -   99  2048
 vfscache 1   256K   -1  
cluster_save buffer 0 0K   -   19  32,64
  Export Host 1 1K   -2  256
 VFS hash 1   128K   -1  
   vnodes 1 1K   -1  128
mount   13012K   -  641  16,32,64,128,512,1024,2048
   CAM periph 1 1K   -1  128
  BPF 4 1K   -4  64
ifnet 5 5K   -5  256,1024
   ifaddr4010K   -   40  16,32,64,256,512,2048
  ether_multi40 2K   -   46  16,32,64
clone 416K   -4  4096
   arpcom 2 1K

Re: FreeBSD 6.0 panic: kmem_malloc(16384): kmem_map too small: 172728320 total allocated [solved]

2005-12-14 Thread Fabian Keil

Scott Long [EMAIL PROTECTED] wrote:

 Gleb Smirnoff wrote:
 
  On Wed, Dec 14, 2005 at 01:25:30PM +0100, Fabian Keil wrote:
  F I triggered a few reproducible panics on FreeBSD 6.0-STABLE.
  F 
  F I created a ramdisk with:
  F  
  F /sbin/mdconfig -a -t malloc -s 256M -u 10
  F /sbin/newfs -U /dev/md10
  F /sbin/mount /dev/md10 /mnt/ramdisk
  F 
  F The system has avail memory = 515932160 (492 MB)
  F and 1GB swap space.
  F 
  F While copying to /mnt/ramdisk trough ftp localhost
  F it got:
  
  This usually exposes some memory leak in kernel. Can you please do
  the following - copy some amount of data to /mnt/ramdisk trough ftp
  localhost, and cancel the operation before it panics.
  
  Then run vmstat -m and vmstat -z, to determine what kind of memory
  allocation is leaking.
  
  
 
 While it can mean a memory leak in the kernel, I don't think that's
 the case here.
 On i386, only 320MB can be allocated to kernel malloc memory.   Much
 of this space
 can get consumed with vnodes and other filesystem structures, so
 trying to allocate
 256MB to a ramdisk is likely putting you over the max.  I'd suggest 
 instead to use
 a swap-back disk.  It doesn't necessarily mean that the ramdisk pages 
 will live in
 swap, it just means that they will get managed directly in the
 bufcache, eliminating
 the 320MB restriction.

I guess you're right. I can fill a 256MB swap-backed disk without panic 
and without swapping.

Before ftp localhost:
last pid:   652;  load averages:  0.02,  0.09,  0.07up 0+00:07:16
17:12:05 37 processes:  1 running, 36 sleeping
CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.4% interrupt,
99.6% idle Mem: 11M Active, 12M Inact, 18M Wired, 11M Buf, 453M Free
Swap: 999M Total, 999M Free

After ftp localhost:
last pid:   666;  load averages:  0.20,  0.12,  0.08up 0+00:09:05
17:13:54 36 processes:  1 running, 35 sleeping
CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.4% interrupt,
99.6% idle Mem: 244M Active, 150M Inact, 73M Wired, 27M Cache, 60M Buf, 984K 
Free
Swap: 999M Total, 999M Free

After removal of the swap-backed disk:
last pid:   690;  load averages:  0.00,  0.01,  0.03up 0+00:17:53
17:22:42 34 processes:  1 running, 33 sleeping
CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,
100% idle Mem: 15M Active, 76M Inact, 43M Wired, 13M Cache, 60M Buf, 347M Free
Swap: 999M Total, 999M Free

Thanks for your time Gleb and Scott.

Fabian
-- 
http://www.fabiankeil.de/


signature.asc
Description: PGP signature

Missing wep_wlan at 5.4 (was: Atheros (ath0) no RX traffic)

2005-11-08 Thread Fabian Keil

Sam Leffler [EMAIL PROTECTED] wrote:

 Stephen Montgomery-Smith wrote:
  Richard Arends wrote:

  Today I upgraded my laptop from 5-STABLE to 6-STABLE. After the
  upgrade, my wireless is not working anymore.
  
  
  You are doing better than me.  I try this:
  ifconfig ath0 wepkey 12345
  and get
  ifconfig: SIOCS80211: Invalid argument
  
  (Actually maybe that is happening to you as well, but since you are 
  setting ifconfig_ath0 from within rc.conf, you might be missing
  this error message as it flies by in your start up.)
  
  I get this error on other wireless cards as well.
 
 kldload wlan_wep

Since a few days I get ifconfig: SIOCS80211: Invalid argument
while trying to set up wep with up to date ndis stuff on 5.4.

ATM I use an older ndis build which still works.

wlan_wep seems to exist at 6.0 only:
http://fxr.watson.org/fxr/source/modules/wlan_wep/?v=RELENG54
http://fxr.watson.org/fxr/source/modules/wlan_wep/?v=RELENG6

Is there some secret I don't know about?

Fabian
-- 
http://www.fabiankeil.de/


pgp6YHwsfXpfs.pgp
Description: PGP signature

Re: ndisgen intended to be the only way to generate ndis?

2005-07-06 Thread Fabian Keil

Daniel O'Connor [EMAIL PROTECTED] wrote:

 On Tue, 5 Jul 2005 22:49, Fabian Keil wrote:
  AFAIK, nobody has announced that the old way is death,
  therefore I would like to know if the breakage is intentional
  and if it is, if there's a technical reason why these methods
  can no longer coexists.
 
 The old way built the .sys and .inf files into a .ko along with if_ndis 
 code.
 
 In the new way you build the .sys and .inf files into a .ko without any 
 other code. When you load it, it pulls in if_ndis which then reads the 
 wrapped .sys and .inf file you loaded.
 
 You can't build things the old way any more because the if_ndis code no 
 longer expects to be linked to a .sys file.

Thanks for pointing this out.
I have missed this design change completely.

 I suggest the best approach would be to submit improved documentation for the 
 ndiscvt man page (and a new ndisgen page) along with some handbook changes. 
 It would also be fairly trivial to modify ndisgen to take some arguments.

Agreed.

Fabian
-- 
http://www.fabiankeil.de/


pgpbzxtmytodR.pgp
Description: PGP signature

ndisgen intended to be the only way to generate ndis?

2005-07-05 Thread Fabian Keil

Hi all,

as you probably have noticed, the amount of mails about
problems with compiling ndis has increased in the last
four weeks.

The old way to compile ndis was to go to 
/usr/src/sys/modules/if_ndis/, use ndiscvt to create a header
file containing the windows driver and to make;make install.

It was fast and well documented in the handbook and on
the web in general.

Later Bill Paul wrote /usr/sbin/ndisgen to automate these steps.

ndisgen is an interactive shell script, it is user friendly
and describes what it's doing. However, using it is slower
than the old way was. You can't use shell auto completion to
specify the location of the drivers sys and inf files, some
steps are done, even if they aren't needed each time you
recompile ndis.

ATM the existence of ndisgen is poorly documented.
It's not mentioned in the handbook, not in the man pages
and seldom appears on other websites. If you don't read
the mailing lists or the cvs logs, you probably won't know
about it.  

For a while the new and the old way coexisted, everybody
was happy. Since perhaps four weeks, the old way stopped working
for many (all?) people. You can still build and kldload the
needed modules without error, but they will not work.

Most of the time (every time?) ndisgen still does.

AFAIK, nobody has announced that the old way is death,
therefore I would like to know if the breakage is intentional
and if it is, if there's a technical reason why these methods
can no longer coexists. 

Mark A-J. Raught wrote on freebsd-mobile yesterday: 
I prefer the old way, but as long as it works I'll suffer
through the wizard feel.

So do I, I guess we're not alone.

Fabian
-- 
http://www.fabiankeil.de/


pgpjdKg9wEYXe.pgp
Description: PGP signature

Fw: 5.4-STABLE panic: kernel trap 12 with interrupts diabled

2005-05-08 Thread Fabian Keil

Hi list,

forwarding to freebsd-stable (probably the right place anyway),
since I got no further responses on freebsd-questions.

Subhro [EMAIL PROTECTED] wrote:

 On 5/5/2005 19:43, Fabian Keil wrote:

 the day before yesterday I experienced my first
 panic on 5.4-STABLE. Build and cvsup'ed last
 Friday. My system is a ThinkPad R51
 
 I did nothing spectacular, after boot I:
 
 logged in as user
 cdrecord -scanbus (which didn't work as I hadn't yet set it suid)
 su
 chmod +x for cdrecord and readcd (meant was +g ;-)
 exit
 cdrecord -scanbus (didn't yet work ;-)
 su
 cdrecord -scanbus (did work)
 readcd dev=2,0,0 -factor meshpoints=100 f=./file
 exit
 
 Then I moved the laptop and plugged in the AC/DC adapter.
 
 whoami brought me:
 
 Kernel trap 12 with interrupts disabled
 Fatal trap 12: page fault while in kernel mode
 fault virtual address= 0xa94d06c
 fault code  = supervisor read, page not present
 instruction pointer = 0x8:0xc053cbe5
 stack pointer = 0x10:0xe669f98c
 frame pointer= 0x10:0xe669f990
 code segment   = base 0x0, limit 0xf, type 0x1b
  = DPL 0, pres 1, def32 1, gran 1
 processor eflags= resume, IOPL = 0
 current process = 601 (whoami)
 trap number = 12
 panic: page fault
 
 I saved the dump manually with savecore and then tried
 to follow:
 http://www.freebsd.org/doc/en_US.ISO8859-1/books/faq/advanced.html#KERNEL-PANIC-TROUBLESHOOTING
 
 [EMAIL PROTECTED] ~ $nm -n /boot/kernel/kernel | grep c053cb
 c053cb4c T init_turnstiles
 c053cbc9 t init_turnstile0
 c053cbd8 t turnstile_setowner
 
 My kernel contains makeoptions DEBUG=-g,  however
 I don't have the file /sys/compile/KERNELCONFIG/kernel.debug
 and thus wasn't able to do
 % gdb -k /sys/compile/KERNELCONFIG/kernel.debug /var/crash/vmcore.0

It turned out that I just was looking at the wrong places, kernel.debug
was found at /usr/obj/usr/src/sys/THINKPAD/kernel.debug.

http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-gdb.html
fits better and contains a pointer to kgdb.

 [EMAIL PROTECTED] ~ $cat info.0 
 Dump header from device /dev/ad0s3b
   Architecture: i386
   Architecture Version: 16777216
   Dump Length: 536215552B (511 MB)
   Blocksize: 512
   Dumptime: Tue May  3 20:18:11 2005
   Hostname: r51.local
   Magic: FreeBSD Kernel Dump
   Version String: FreeBSD 5.4-STABLE #6: Sat Apr 30 14:57:04 CEST 2005
 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/THINKPAD
   Panic String: page fault
   Dump Parity: 1084811848
   Bounds: 0
   Dump Status: good
 
 The kernel was build the new way.
 I was not able to reproduce the panic.
 
 Is there anything else I can do?

 It would be great to have a look at the core. Can you put it up 
 somewhere on the WEB? Also if you are not running a GENERIC kernel then 
 let us have a look at the config file.

[EMAIL PROTECTED] ~ $ls -lh|grep core
-rw---   1 fk  wheel   511M May  3 20:38 vmcore.0
-rw---   1 fk  wheel   354M May  5 19:11 vmcore.0.gz

I don't have that much web space available.
However the following seems to be interesting:

[EMAIL PROTECTED] ~ $kgdb kernel.debug vmcore.0
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: 
Undefined symbol ps_pglobal_lookup]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-marcel-freebsd.
#0  doadump () at pcpu.h:160
160 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) where
#0  doadump () at pcpu.h:160
#1  0xc0519e76 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:410
#2  0xc051a1a7 in panic (fmt=0xc06bafe5 %s) at 
/usr/src/sys/kern/kern_shutdown.c:566
#3  0xc0693758 in trap_fatal (frame=0xe669f94c, eva=0) at 
/usr/src/sys/i386/i386/trap.c:809
#4  0xc0692dca in trap (frame=
  {tf_fs = 24, tf_es = 16, tf_ds = 16, tf_edi = -429261692, tf_esi = 
-1043159552, tf_ebp = -429262448, tf_isp = -429262472, tf_ebx = -1043640192, 
tf_edx = -1043640192, tf_ecx = 177524736, tf_eax = 177524736, tf_trapno = 12, 
tf_err = 0, tf_eip = -1068250139, tf_cs = 8, tf_eflags = 65539, tf_esp = 
-1043159552, tf_ss = -429262416})
at /usr/src/sys/i386/i386/trap.c:247
#5  0xc0681baa in calltrap () at /usr/src/sys/i386/i386/exception.s:140
#6  0x0018 in ?? ()
#7  0x0010 in ?? ()
#8  0x0010 in ?? ()
#9  0xe669fc84 in ?? ()
#10 0xc1d2a600 in ?? ()
#11 0xe669f990 in ?? ()
#12 0xe669f978 in ?? ()
#13 0xc1cb5080 in ?? ()
#14 0xc1cb5080 in ?? ()
#15 0x0a94d000 in ?? ()
#16 0x0a94d000 in ?? ()
#17 0x000c in ?? ()
#18 0x in ?? ()
#19 0xc053cbe5 in turnstile_setowner (ts=0xc1cb5080, owner=0x0) at 
/usr/src/sys/kern/subr_turnstile.c:367
#20

91 matches

Mail list logo