[OpenAFS] Early Announcement: 3rd European AFS Kerberos Conference 2010
Dear AFS Kerberos lovers! we are pleased to announce the final schedule for the 3rd European AFS Kerberos Conference 2010 The conference will take place in Pilsen, Czech Republic, from September 13 to September 15, 2010. Further details will follow and can be found at http://afs2010.civ.zcu.cz Please, book your time in advance a feel free to contact us with any further questions or suggestions! The Organizers (JML) afs2...@civ.zcu.cz
[OpenAFS] Windows XP event log warning entry with ID 4117 and 4133
Greetings, does anybody know what means the Windows XP event log warning entry with ID 4117 and 4133? I am experiencing problems of openafs client v. 1.5.26 for Windows, it sometimes hangs up for about two minutes. In this time an event log warning entries with IDs 4117 and 4133 are created, no meaningful description is available. I have failed to google these events so I would be very glad if someone has an idea. Thank you very much. Michal Svamberg
Re: [OpenAFS] vos dump has timeout 700 second if vlserver down
Ok, I removed the last line from my /etc/openafs/server/CellServDB (server sauron): zcu.cz# University of West Bohemia, Czech Republic 147.228.52.10 #oknos.zcu.cz 147.228.52.17 #nic.zcu.cz 147.228.10.18 #sauron.zcu.cz Using of 'vos dump' helped, now it works better, But 'vos release volume -localauth' is faulty: vos rel common.etc.xen -v -localauth Could not lock the VLDB entry for the volume 876072271. u: not synchronization site (should work on sync site) Error in vos release command. u: not synchronization site (should work on sync site) What is wrong? Michal Svamberg On Thu, Aug 14, 2008 at 11:58 AM, Hartmut Reuter [EMAIL PROTECTED] wrote: Michal Svamberg wrote: Hi, I have 3 vlservers. When one of these servers is down, the 'vos dump' is waiting for a long time. The timeout is defined in the function DumpVolume() at volser/vos.c: rx_SetRxDeadTime(60 * 10); With this parameter, the timeout is exactly 700 seconds (by wireshark). Changing the parameter to 10*10 leads to a timeout 112 seconds. In the attachment, I send the wireshark dump of communications of 'vos dump' with vlserver (147.228.10.17 is down). Why other openafs commands have smaller timeout (app. 12 seconds)? Because when the old (non-pthreaded) volserver asked the fileserver for a volume it hung in the read to the socket without a chance to serve rx-requests. Why 'vos dump' has such a big timeout? Is there any option to change it? If you know one of the vlservers is dead take it out of the CellServDB on the machine where you do the vos dump. I have big problems when one vlserver is down and I am creating a dump of thousands volumes. I use bacula for creating backups. Thanks for responses. Michal Svamberg -- - Hartmut Reuter e-mail [EMAIL PROTECTED] phone+49-89-3299-1328 fax +49-89-3299-1301 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr Computing Center of the Max-Planck-Gesellschaft (MPG) and the Institut fuer Plasmaphysik (IPP) - ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] vos dump has timeout 700 second if vlserver down
Hi, I have 3 vlservers. When one of these servers is down, the 'vos dump' is waiting for a long time. The timeout is defined in the function DumpVolume() at volser/vos.c: rx_SetRxDeadTime(60 * 10); With this parameter, the timeout is exactly 700 seconds (by wireshark). Changing the parameter to 10*10 leads to a timeout 112 seconds. In the attachment, I send the wireshark dump of communications of 'vos dump' with vlserver (147.228.10.17 is down). Why other openafs commands have smaller timeout (app. 12 seconds)? Why 'vos dump' has such a big timeout? Is there any option to change it? I have big problems when one vlserver is down and I am creating a dump of thousands volumes. I use bacula for creating backups. Thanks for responses. Michal Svamberg vlserv_down Description: Binary data
[OpenAFS] SIGSEGV on aklog, pts or vos commands at process.c:213
Hello, I have a problem on some computers (Debian Etch + etch-backports). Older versions of openafs (1.4.2, 1.4.4 and 1.4.6) don't have this problem. Installed version: # dpkg -l |grep openafs | awk '{print $2 \t\t\t $3}' libopenafs-dev 1.4.7~pre3.dfsg1-1~bpo40+1 openafs-client 1.4.7~pre3.dfsg1-1~bpo40+1 openafs-dbg 1.4.7~pre3.dfsg1-1~bpo40+1 openafs-doc 1.4.7~pre3.dfsg1-1~bpo40+1 openafs-fileserver 1.4.7~pre3.dfsg1-1~bpo40+1 openafs-krb51.4.7~pre3.dfsg1-1~bpo40+1 openafs-modules-2.6.22-4-686 1.4.7~pre3.dfsg1-1~bpo40+1+2.6.22-6~bpo40+2 openafs-modules-source 1.4.7~pre3.dfsg1-1~bpo40+1 # uname -a Linux listik.zcu.cz 2.6.22-4-686 #1 SMP Tue Feb 12 16:29:32 UTC 2008 i686 GNU/Linux $ gdb --quiet (gdb) file /usr/bin/pts Reading symbols from /usr/bin/pts...Reading symbols from /usr/lib/debug/usr/bin/ pts...done. Using host libthread_db library /usr/lib/debug/libthread_db.so.1. done. (gdb) run mem svamberg Starting program: /usr/bin/pts mem svamberg Program received signal SIGSEGV, Segmentation fault. savecontext (ep=0x8076140 Create_Process_Part2, savearea=0x80c59c4, sp=0xb7cf700c üýţ˙) at ./process.c:213 213 (*EP) (); (gdb) l 208 jmpBuffer[LWP_FP] = ptr_mangle((jmp_buf_type) sp); 209 #endif 210 longjmp(jmp_tmp, 1); 211 break; 212 case 1: 213 (*EP) (); 214 assert(0); /* never returns */ 215 break; 216 default: 217 perror(Error in setjmp1\n); (gdb) bt #0 savecontext (ep=0x80754a0 Create_Process_Part2, savearea=0x80bde44, sp=0xb7caa00c üýţ˙) at ./process.c:213 #1 0x080757e7 in LWP_CreateProcess (ep=0x80766f0 IOMGR, stacksize=value optimized out, priority=0, parm=0x0, name=0x807f01d IO MANAGER, pid=0x80921c8) at ./lwp.c:409 #2 0x080766e6 in IOMGR_Initialize () at ./iomgr.c:820 #3 0x08074ae4 in rxi_InitializeThreadSupport () at rx_lwp.c:117 #4 0x0806d791 in rx_InitHost (host=0, port=0) at rx.c:403 #5 0x0806d9d9 in rx_Init (port=0) at rx.c:540 #6 0x0804dcf4 in pr_Initialize (secLevel=0, confDir=0x8083040 /etc/openafs, cell=0xbfb25696 zcu.cz) at ptuser.c:166 #7 0x0804b1aa in auth_to_cell (context=0x809a058, cell=value optimized out, realm=0x0) at aklog_main.c:720 #8 0x0804c472 in aklog (argc=1, argv=0xbfb2fbe4) at aklog_main.c:1381 #9 0x0804a0c2 in main (argc=Cannot access memory at address 0xf951e550 ) at aklog.c:18 #10 0xb7db6450 in __libc_start_main (main=0x804a0a0 main, argc=1, ubp_av=0xbfb2fbe4, init=0x807b5b0 __libc_csu_init, fini=0x807b560 __libc_csu_fini, rtld_fini=0xb7fb0dc0 _dl_fini, stack_end=0xbfb2fbdc) at libc-start.c:222 The same SIGSEGV exist with running aklog or vos. What's wrongs? Thanks. Michal Svamberg
Re: [OpenAFS] SIGSEGV on aklog, pts or vos commands at process.c:213
Hello, 2008/5/16 Marc Dionne [EMAIL PROTECTED]: Did you build the package yourself? There's probably something going on at the configure stage that didn't enable ucontext. For instance, is HAVE_UCONTEXT_H defined in src/config/afsconfig.h? This is debian/etch-backports compilation. Now I get sources from etch-backports and compile it. After recompilation it's all works without SIGSEGV. # apt-get source openafs/etch-backports # cd openafs-1.4.7~pre3.dfsg1 # dpkg-buildpackage ... # head -n 18 config.log This file contains any messages produced by compilers while running configure, to aid debugging if configure makes a mistake. It was created by configure, which was generated by GNU Autoconf 2.61. Invocation command line was $ configure --with-afs-sysname=i386_linux26 --disable-kernel-module --prefix=/ usr --mandir=${prefix}/share/man --sysconfdir=/etc --libexecdir=/usr/lib --local statedir=/var/lib --with-krb5-conf=/usr/bin/krb5-config --enable-supergroups --e nable-largefile-fileserver --enable-bos-new-config --enable-debug --enable-lwp-d ebug --build i486-linux-gnu ## - ## ## Platform. ## ## - ## hostname = listik.zcu.cz uname -m = i686 uname -r = 2.6.22-4-686 uname -s = Linux uname -v = #1 SMP Tue Feb 12 16:29:32 UTC 2008 # cat src/config/afsconfig.h | grep -i context /* Define to 1 if you have the ucontext.h header file. */ #define HAVE_UCONTEXT_H 1 # cd ..; dpkg -i openafs-client*.deb openafs-krb5*.deb No SIGSEGV on pts, aklog or vos command now. This is propably debian specific bug. Thanks Marc. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] repeated message: Delete longest inactive host
Hello, AFS Fileserver (OpenAFS 1.4.1 built 2006-05-05) was coming to meltdown slowly. After debuging (kill -TSTP) to the fileserver on level 1, it writes a log message: ---cut--- Fri Nov 3 08:26:50 2006 [16] GSS: First looking for timed out call backs via CleanupCallBacks Fri Nov 3 08:26:50 2006 [16] GSS: Try harder for longest inactive host cnt= 1 Fri Nov 3 08:26:50 2006 [16] GSS: Try harder for longest inactive host cnt= 2 Fri Nov 3 08:26:50 2006 [16] GSS: Delete longest inactive host 147.228.53.104 ... AND REPEATING THE SAME LINES ... ---cut--- During twenty seconds the fileserver produces 50MB of log (the same lines above). I try making a dump, but the fileserver make only empty files,and it writes to FileLog: ---cut--- Fri Nov 3 08:32:16 2006 Created client dump /etc/openafs/server-local/client.dump Fri Nov 3 08:32:16 2006 Vice was last started at Fri Oct 20 08:32:54 2006 Fri Nov 3 08:32:16 2006 Large vnode cache, 600 entries, 20301 allocs, 124772344 gets (3140082 reads), 2808362 writes Fri Nov 3 08:32:16 2006 Small vnode cache,600 entries, 304607 allocs, 82648006 gets (17546492 reads), 2655653 writes Fri Nov 3 08:32:16 2006 Volume header cache, 600 entries, 125656518 gets, 584083 replacements Fri Nov 3 08:32:16 2006 Partition /vicepa: 303787844 available 1K blocks (minfree=0), Fri Nov 3 08:32:16 2006 239651500 free blocks Fri Nov 3 08:32:16 2006 Partition /vicepb: 292960332 available 1K blocks (minfree=0), Fri Nov 3 08:32:16 2006 257618224 free blocks Fri Nov 3 08:32:16 2006 Partition /vicepc: 292960332 available 1K blocks (minfree=0), Fri Nov 3 08:32:16 2006 255503768 free blocks Fri Nov 3 08:32:16 2006 With 120 directory buffers; 10025532 reads resulted in 212317 read I/Os Fri Nov 3 08:32:16 2006 Total Client entries = 462, blocks = 265; Host entries = 150, blocks = 1 Fri Nov 3 08:32:16 2006 There are 462 connections, process size 135544 Fri Nov 3 08:32:16 2006 There are 150 workstations, 20 are active (req in 15 mins), 1 marked down Fri Nov 3 08:32:16 2006 Shutting down file server at Fri Nov 3 08:32:16 2006 Fri Nov 3 08:32:16 2006 Vice was last started at Fri Oct 20 08:32:54 2006 Fri Nov 3 08:32:16 2006 Large vnode cache, 600 entries, 20301 allocs, 124772371 gets (3140090 reads), 2808362 writes Fri Nov 3 08:32:16 2006 Small vnode cache,600 entries, 304607 allocs, 82648008 gets (17546493 reads), 2655653 writes Fri Nov 3 08:32:16 2006 Volume header cache, 600 entries, 125656545 gets, 584083 replacements Fri Nov 3 08:32:16 2006 Partition /vicepa: 303787844 available 1K blocks (minfree=0), Fri Nov 3 08:32:16 2006 239651500 free blocks Fri Nov 3 08:32:16 2006 Partition /vicepb: 292960332 available 1K blocks (minfree=0), Fri Nov 3 08:32:16 2006 257618224 free blocks Fri Nov 3 08:32:16 2006 Partition /vicepc: 292960332 available 1K blocks (minfree=0), Fri Nov 3 08:32:16 2006 255503768 free blocks Fri Nov 3 08:32:16 2006 With 120 directory buffers; 10025532 reads resulted in 212317 read I/Os Fri Nov 3 08:32:16 2006 Total Client entries = 463, blocks = 265; Host entries = 150, blocks = 1 Fri Nov 3 08:32:16 2006 There are 463 connections, process size 135544 Fri Nov 3 08:32:16 2006 There are 150 workstations, 20 are active (req in 15 mins), 1 marked down Fri Nov 3 08:32:16 2006 VShutdown: shutting down on-line volumes... ---cut--- These lines are writen to FileLog after the Fileserver shutdown (in lines in log show the same time as in shutdown). Thanks for any ideas, Michal Svamberg ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] many packet are as rx_ignoreAckedPacket and meltdown
Hi, Thank for the link. The problem is that the clients have the same UUID because they have the same SID. This problem is seen at hosts.dump (kill -XCPU pid_of_fileserver) near the line with string lock:, for example: ---cut--- ip:360de493 port:7001 hidx:251 cbid:16297 lock: last:1159945605 active:1 159940686 down:0 del:0 cons:0 cldel:32 hpfailed:0 hcpsCall:1159943657 hcps [ -211] [ 330de493 3a0de493 370de49 3 360de493 430de493 440de493 3e0de493 420de493 3d0de493 470de493 320de493 480de4 93 490de493 450de493 340de493 3f0de493 350de493 410de493 400de493 3c0de493] hold s: 3bf69 slot/bit: 0/1 ---cut--- The IP addresses of wrong configurated clients are in line with 'hpfailed'. After reconfiguration all affected stations the meltdown doesn't appear any more. I have a question about this problem, do you consider about new option with maximum clients with the same UUID that can connected to fileserver? Or write warning message to FileLog (without debug)? By my opinion it is not good if clients are able to shutdown a server. Thanks for answer, Michal Svamberg. On 10/10/06, Derrick J Brashear [EMAIL PROTECTED] wrote: On Tue, 10 Oct 2006, Michal Svamberg wrote: We upgraded file servers to 1.4.1 (built 2006-05-05) but not solve meltdown. get a backtrace when the fileserver is not responding. on a whim, you might also try this patch: http://grand.central.org/rt/Ticket/Display.html?id=19461 ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] many packet are as rx_ignoreAckedPacket and meltdown
We upgraded file servers to 1.4.1 (built 2006-05-05) but not solve meltdown. Fileserver going in large mode and behaviour of meltdown are: - first 10min: 12 idle threads are full used, only 2 idle threads (get from rxdebug) - 0: wprocs counting up from zero - about next 10 min: up to 300 process waitings for thread - fileserver server clear wprocs (sent VBUSY to clients?) but not free threads, wprocs counting from zero again - after next 10 min: loop is closed, wprocs counting to 300 and clearing its. - any time: restart fileserver (via bos command) to normal running Infliction of meltdown are: - users servers (RW + backup volumes) - software servers (RW + RO + backup volumes) - replication serveres (RO volumes) Upgrade not solved question about rx_ignoreAckedPacket. What packets are marked as rx_ignoreAckedPacket? I have a tons of logs but I don't know what search in logs. Do you have any ideas? Thanks, Michal Svamberg. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
[OpenAFS] many packet are as rx_ignoreAckedPacket and meltdown
Hello, I don't know what is rx_ignoreAckedPacket. I have thousands (up to 5) per 15 seconds of rx_ignoreAckedPacket on the fileserver. Number of calls are less (up to 1). Is posible tenth calls of rx_ignoreAckedPacket? We have this infrastructure: Fileservers (large mode): OpenAFS 1.3.81 built 2005-05-14 (debian/stable) windows and linux clients from version 1.2 to 1.4 and for experimantal use 1.5 OpenAFS 1.2.10 built 2005-04-06 OpenAFS 1.3.82 built 2005-08-20 OpenAFS 1.4.2fc4 built 2006-10-02 OpenAFS1.4.0101 Some of fileserver sometimes going to meltdown state (calls waiting for a thread) and don't know reason. There is 'rxdebug -rxstats': Free packets: 935, packet reclaims: 1283, calls: 2197185, used FDs: 64 not waiting for packets. 201 calls waiting for a thread 2 threads are idle rx stats: free packets 935, allocs 7046769, alloc-failures(rcv 0/0,send 0/0,ack 0) greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers 0, selects 0, sendSelects 0 packets read: data 2220845 ack 3323232 busy 5 abort 5125 ackall 3 challenge 4 response 1098 debug 43944 params 0 unused 0 unused 0 unused 0 version 0 other read counters: data 2220774, ack 3322547, dup 0 spurious 165 dally 5 packets sent: data 2851035 ack 54295 busy 592 abort 72 ackall 0 challenge 109 8 response 4 debug 0 params 0 unused 0 unused 0 unused 0 version 0 other send counters: ack 54295, data 9546732 (not resends), resends 2908, pus hed 0, ackedignored 3238665 (these should be small) sendFailed 0, fatalErrors 0 Average rtt is 0.006, with 745772 samples Minimum rtt is 0.000, maximum is 60.235 518 server connections, 676 client connections, 706 peer structs, 350 call st ructs, 0 free call structs Thanks for any answer. Michal Svamberg ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info