[OpenAFS] Early Announcement: 3rd European AFS Kerberos Conference 2010

2009-11-12 Thread Michal Svamberg
 Dear AFS  Kerberos lovers!

we are pleased to announce the final schedule for the

   3rd European AFS  Kerberos Conference 2010

The conference will take place in Pilsen, Czech Republic, from September 13
to September 15, 2010. Further details will follow and can be found at

 http://afs2010.civ.zcu.cz

Please, book your time in advance a feel free to contact us with any further
questions or suggestions!

The Organizers (JML)
afs2...@civ.zcu.cz


[OpenAFS] Windows XP event log warning entry with ID 4117 and 4133

2009-03-09 Thread Michal Svamberg
Greetings,
does anybody know what means the Windows XP event log warning entry with ID
4117 and 4133? I am experiencing problems of openafs client v. 1.5.26 for
Windows, it sometimes hangs up for about two minutes. In this time an event
log warning entries with IDs 4117 and 4133 are created, no meaningful
description is available. I have failed to google these events so I would be
very glad if someone has an idea.
Thank you very much.

Michal Svamberg


Re: [OpenAFS] vos dump has timeout 700 second if vlserver down

2008-09-08 Thread Michal Svamberg
Ok,
I removed the last line from my /etc/openafs/server/CellServDB (server sauron):

zcu.cz# University of West Bohemia, Czech Republic
147.228.52.10   #oknos.zcu.cz
147.228.52.17   #nic.zcu.cz
147.228.10.18   #sauron.zcu.cz

Using of 'vos dump' helped, now it works better,
But 'vos release volume -localauth' is faulty:

vos rel common.etc.xen -v -localauth
Could not lock the VLDB entry for the volume 876072271.
u: not synchronization site (should work on sync site)
Error in vos release command.
u: not synchronization site (should work on sync site)

What is wrong?

Michal Svamberg

On Thu, Aug 14, 2008 at 11:58 AM, Hartmut Reuter [EMAIL PROTECTED] wrote:
 Michal Svamberg wrote:

 Hi,
 I have 3 vlservers. When one of these servers is down, the 'vos dump' is
 waiting for a long time.
 The timeout is defined in the function DumpVolume() at volser/vos.c:
 rx_SetRxDeadTime(60 * 10);
 With this parameter, the timeout is exactly 700 seconds (by wireshark).
 Changing the parameter to 10*10 leads to a timeout 112 seconds.

 In the attachment, I send the wireshark dump of communications of 'vos
 dump' with
 vlserver (147.228.10.17 is down).

 Why other openafs commands have smaller timeout (app. 12 seconds)?

 Because when the old (non-pthreaded) volserver asked the fileserver for a
 volume it hung in the read to the socket without a chance to serve
 rx-requests.

 Why 'vos dump' has such a big timeout?
 Is there any option to change it?

 If you know one of the vlservers is dead take it out of the CellServDB on
 the machine where you do the vos dump.


 I have big problems when one vlserver is down and I am creating a dump
 of thousands volumes.
 I use bacula for creating backups.

 Thanks for responses.
 Michal Svamberg


 --
 -
 Hartmut Reuter  e-mail  [EMAIL PROTECTED]
phone+49-89-3299-1328
fax  +49-89-3299-1301
 RZG (Rechenzentrum Garching)webhttp://www.rzg.mpg.de/~hwr
 Computing Center of the Max-Planck-Gesellschaft (MPG) and the
 Institut fuer Plasmaphysik (IPP)
 -

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] vos dump has timeout 700 second if vlserver down

2008-08-14 Thread Michal Svamberg
Hi,
I have 3 vlservers. When one of these servers is down, the 'vos dump' is
waiting for a long time.
The timeout is defined in the function DumpVolume() at volser/vos.c:
rx_SetRxDeadTime(60 * 10);
With this parameter, the timeout is exactly 700 seconds (by wireshark).
Changing the parameter to 10*10 leads to a timeout 112 seconds.

In the attachment, I send the wireshark dump of communications of 'vos
dump' with
vlserver (147.228.10.17 is down).

Why other openafs commands have smaller timeout (app. 12 seconds)?
Why 'vos dump' has such a big timeout?
Is there any option to change it?

I have big problems when one vlserver is down and I am creating a dump
of thousands volumes.
I use bacula for creating backups.

Thanks for responses.
Michal Svamberg


vlserv_down
Description: Binary data


[OpenAFS] SIGSEGV on aklog, pts or vos commands at process.c:213

2008-05-16 Thread Michal Svamberg
Hello,
I have a problem on some computers (Debian Etch + etch-backports).
Older versions of openafs (1.4.2, 1.4.4 and 1.4.6) don't have this problem.

Installed version:
# dpkg -l |grep openafs | awk '{print $2 \t\t\t $3}'
libopenafs-dev  1.4.7~pre3.dfsg1-1~bpo40+1
openafs-client  1.4.7~pre3.dfsg1-1~bpo40+1
openafs-dbg 1.4.7~pre3.dfsg1-1~bpo40+1
openafs-doc 1.4.7~pre3.dfsg1-1~bpo40+1
openafs-fileserver  1.4.7~pre3.dfsg1-1~bpo40+1
openafs-krb51.4.7~pre3.dfsg1-1~bpo40+1
openafs-modules-2.6.22-4-686
1.4.7~pre3.dfsg1-1~bpo40+1+2.6.22-6~bpo40+2
openafs-modules-source  1.4.7~pre3.dfsg1-1~bpo40+1

# uname -a
Linux listik.zcu.cz 2.6.22-4-686 #1 SMP Tue Feb 12 16:29:32 UTC 2008
i686 GNU/Linux

$ gdb --quiet
(gdb) file /usr/bin/pts
Reading symbols from /usr/bin/pts...Reading symbols from /usr/lib/debug/usr/bin/
pts...done.
Using host libthread_db library /usr/lib/debug/libthread_db.so.1.
done.
(gdb) run mem svamberg
Starting program: /usr/bin/pts mem svamberg

Program received signal SIGSEGV, Segmentation fault.
savecontext (ep=0x8076140 Create_Process_Part2, savearea=0x80c59c4,
sp=0xb7cf700c üýţ˙) at ./process.c:213
213 (*EP) ();
(gdb) l
208 jmpBuffer[LWP_FP] = ptr_mangle((jmp_buf_type) sp);
209 #endif
210 longjmp(jmp_tmp, 1);
211 break;
212 case 1:
213 (*EP) ();
214 assert(0);  /* never returns */
215 break;
216 default:
217 perror(Error in setjmp1\n);
(gdb) bt
#0  savecontext (ep=0x80754a0 Create_Process_Part2, savearea=0x80bde44,
sp=0xb7caa00c üýţ˙) at ./process.c:213
#1  0x080757e7 in LWP_CreateProcess (ep=0x80766f0 IOMGR,
stacksize=value optimized out, priority=0, parm=0x0,
name=0x807f01d IO MANAGER, pid=0x80921c8) at ./lwp.c:409
#2  0x080766e6 in IOMGR_Initialize () at ./iomgr.c:820
#3  0x08074ae4 in rxi_InitializeThreadSupport () at rx_lwp.c:117
#4  0x0806d791 in rx_InitHost (host=0, port=0) at rx.c:403
#5  0x0806d9d9 in rx_Init (port=0) at rx.c:540
#6  0x0804dcf4 in pr_Initialize (secLevel=0, confDir=0x8083040 /etc/openafs,
cell=0xbfb25696 zcu.cz) at ptuser.c:166
#7  0x0804b1aa in auth_to_cell (context=0x809a058, cell=value optimized out,
realm=0x0) at aklog_main.c:720
#8  0x0804c472 in aklog (argc=1, argv=0xbfb2fbe4) at aklog_main.c:1381
#9  0x0804a0c2 in main (argc=Cannot access memory at address 0xf951e550
) at aklog.c:18
#10 0xb7db6450 in __libc_start_main (main=0x804a0a0 main, argc=1,
ubp_av=0xbfb2fbe4, init=0x807b5b0 __libc_csu_init,
fini=0x807b560 __libc_csu_fini, rtld_fini=0xb7fb0dc0 _dl_fini,
stack_end=0xbfb2fbdc) at libc-start.c:222

The same SIGSEGV exist with running aklog or vos. What's wrongs?

Thanks.
Michal Svamberg


Re: [OpenAFS] SIGSEGV on aklog, pts or vos commands at process.c:213

2008-05-16 Thread Michal Svamberg
Hello,

2008/5/16 Marc Dionne [EMAIL PROTECTED]:
 Did you build the package yourself?  There's probably something going
 on at the configure stage that didn't enable ucontext.  For instance,
 is HAVE_UCONTEXT_H defined in src/config/afsconfig.h?

This is debian/etch-backports compilation.

Now I get sources from etch-backports and compile it. After recompilation
it's all works without SIGSEGV.

# apt-get source openafs/etch-backports
# cd openafs-1.4.7~pre3.dfsg1
# dpkg-buildpackage
...

# head -n 18 config.log
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by configure, which was
generated by GNU Autoconf 2.61.  Invocation command line was

  $ configure --with-afs-sysname=i386_linux26 --disable-kernel-module --prefix=/
usr --mandir=${prefix}/share/man --sysconfdir=/etc --libexecdir=/usr/lib --local
statedir=/var/lib --with-krb5-conf=/usr/bin/krb5-config --enable-supergroups --e
nable-largefile-fileserver --enable-bos-new-config --enable-debug --enable-lwp-d
ebug --build i486-linux-gnu

## - ##
## Platform. ##
## - ##

hostname = listik.zcu.cz
uname -m = i686
uname -r = 2.6.22-4-686
uname -s = Linux
uname -v = #1 SMP Tue Feb 12 16:29:32 UTC 2008

# cat src/config/afsconfig.h  | grep -i context
/* Define to 1 if you have the ucontext.h header file. */
#define HAVE_UCONTEXT_H 1


# cd ..; dpkg -i openafs-client*.deb openafs-krb5*.deb

No SIGSEGV on pts, aklog or vos command now. This is propably debian
specific bug.

Thanks Marc.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] repeated message: Delete longest inactive host

2006-11-05 Thread Michal Svamberg

Hello,

AFS Fileserver (OpenAFS 1.4.1 built  2006-05-05) was coming to meltdown slowly.
After debuging (kill -TSTP) to the fileserver on level 1,  it writes a
log message:

---cut---
Fri Nov  3 08:26:50 2006 [16] GSS: First looking for timed out call
backs via CleanupCallBacks
Fri Nov  3 08:26:50 2006 [16] GSS: Try harder for longest inactive host cnt= 1
Fri Nov  3 08:26:50 2006 [16] GSS: Try harder for longest inactive host cnt= 2
Fri Nov  3 08:26:50 2006 [16] GSS: Delete longest inactive host 147.228.53.104
... AND REPEATING THE SAME LINES ...
---cut---

During twenty seconds the fileserver produces 50MB of log (the same
lines above).
I try making a dump, but the fileserver make only empty files,and it
writes to FileLog:

---cut---
Fri Nov  3 08:32:16 2006 Created client dump
/etc/openafs/server-local/client.dump
Fri Nov  3 08:32:16 2006 Vice was last started at Fri Oct 20 08:32:54 2006

Fri Nov  3 08:32:16 2006 Large vnode cache, 600 entries, 20301 allocs,
124772344 gets (3140082 reads), 2808362 writes
Fri Nov  3 08:32:16 2006 Small vnode cache,600 entries, 304607 allocs,
82648006 gets (17546492 reads), 2655653 writes
Fri Nov  3 08:32:16 2006 Volume header cache, 600 entries, 125656518
gets, 584083 replacements
Fri Nov  3 08:32:16 2006 Partition /vicepa: 303787844 available 1K
blocks (minfree=0), Fri Nov  3 08:32:16 2006 239651500 free blocks
Fri Nov  3 08:32:16 2006 Partition /vicepb: 292960332 available 1K
blocks (minfree=0), Fri Nov  3 08:32:16 2006 257618224 free blocks
Fri Nov  3 08:32:16 2006 Partition /vicepc: 292960332 available 1K
blocks (minfree=0), Fri Nov  3 08:32:16 2006 255503768 free blocks
Fri Nov  3 08:32:16 2006 With 120 directory buffers; 10025532 reads
resulted in 212317 read I/Os
Fri Nov  3 08:32:16 2006 Total Client entries = 462, blocks = 265;
Host entries = 150, blocks = 1
Fri Nov  3 08:32:16 2006 There are 462 connections, process size 135544
Fri Nov  3 08:32:16 2006 There are 150 workstations, 20 are active
(req in  15 mins), 1 marked down
Fri Nov  3 08:32:16 2006 Shutting down file server at Fri Nov  3 08:32:16 2006
Fri Nov  3 08:32:16 2006 Vice was last started at Fri Oct 20 08:32:54 2006

Fri Nov  3 08:32:16 2006 Large vnode cache, 600 entries, 20301 allocs,
124772371 gets (3140090 reads), 2808362 writes
Fri Nov  3 08:32:16 2006 Small vnode cache,600 entries, 304607 allocs,
82648008 gets (17546493 reads), 2655653 writes
Fri Nov  3 08:32:16 2006 Volume header cache, 600 entries, 125656545
gets, 584083 replacements
Fri Nov  3 08:32:16 2006 Partition /vicepa: 303787844 available 1K
blocks (minfree=0), Fri Nov  3 08:32:16 2006 239651500 free blocks
Fri Nov  3 08:32:16 2006 Partition /vicepb: 292960332 available 1K
blocks (minfree=0), Fri Nov  3 08:32:16 2006 257618224 free blocks
Fri Nov  3 08:32:16 2006 Partition /vicepc: 292960332 available 1K
blocks (minfree=0), Fri Nov  3 08:32:16 2006 255503768 free blocks
Fri Nov  3 08:32:16 2006 With 120 directory buffers; 10025532 reads
resulted in 212317 read I/Os
Fri Nov  3 08:32:16 2006 Total Client entries = 463, blocks = 265;
Host entries = 150, blocks = 1
Fri Nov  3 08:32:16 2006 There are 463 connections, process size 135544
Fri Nov  3 08:32:16 2006 There are 150 workstations, 20 are active
(req in  15 mins), 1 marked down
Fri Nov  3 08:32:16 2006 VShutdown:  shutting down on-line volumes...
---cut---

These lines are writen to FileLog after  the Fileserver shutdown (in
lines in log show the same time as in shutdown).

Thanks for any ideas,
Michal Svamberg
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] many packet are as rx_ignoreAckedPacket and meltdown

2006-11-05 Thread Michal Svamberg

Hi,
Thank for the link. The problem is that the clients have the same UUID
because they have the same SID. This problem is seen at hosts.dump
(kill -XCPU pid_of_fileserver) near the line with string
lock:, for example:
---cut---
ip:360de493 port:7001 hidx:251 cbid:16297 lock: last:1159945605 active:1
159940686 down:0 del:0 cons:0 cldel:32
hpfailed:0 hcpsCall:1159943657 hcps [ -211] [ 330de493 3a0de493 370de49
3 360de493 430de493 440de493 3e0de493 420de493 3d0de493 470de493 320de493 480de4
93 490de493 450de493 340de493 3f0de493 350de493 410de493 400de493 3c0de493] hold
s: 3bf69 slot/bit: 0/1
---cut---

The IP addresses of wrong configurated clients are in line with
'hpfailed'. After reconfiguration all affected stations the meltdown
doesn't appear any more.

I have a question about this problem, do you consider about new option
with maximum clients with the same UUID that can connected to
fileserver? Or write warning message to FileLog (without debug)?
By my opinion it is not good if clients are able to shutdown a server.

Thanks for answer,
Michal Svamberg.

On 10/10/06, Derrick J Brashear [EMAIL PROTECTED] wrote:

On Tue, 10 Oct 2006, Michal Svamberg wrote:

 We upgraded file servers to 1.4.1 (built  2006-05-05) but not solve meltdown.

get a backtrace when the fileserver is not responding.

on a whim, you might also try this patch:
http://grand.central.org/rt/Ticket/Display.html?id=19461

___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] many packet are as rx_ignoreAckedPacket and meltdown

2006-10-10 Thread Michal Svamberg

We upgraded file servers to 1.4.1 (built  2006-05-05) but not solve meltdown.

Fileserver going in large mode and behaviour of meltdown are:
- first 10min: 12 idle threads are full used, only 2 idle threads
(get from rxdebug)
- 0: wprocs counting up from zero
- about next 10 min: up to 300 process waitings for thread
- fileserver server clear wprocs (sent VBUSY to clients?) but not
free threads, wprocs counting from zero again
- after next 10 min: loop is closed, wprocs counting to 300 and clearing its.
- any time: restart fileserver (via bos command) to normal running

Infliction of meltdown are:
- users servers (RW + backup volumes)
- software servers (RW + RO + backup volumes)
- replication serveres (RO volumes)

Upgrade not solved question about rx_ignoreAckedPacket. What packets are
marked as rx_ignoreAckedPacket?

I have a tons of logs but I don't know what search in logs.

Do you have any ideas?

Thanks, Michal Svamberg.
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] many packet are as rx_ignoreAckedPacket and meltdown

2006-10-06 Thread Michal Svamberg

Hello,
I don't know what is rx_ignoreAckedPacket. I have thousands (up to 5)
per 15 seconds of rx_ignoreAckedPacket on the fileserver. Number of
calls are less
(up to 1). Is posible tenth calls of rx_ignoreAckedPacket?

We have this infrastructure:
Fileservers (large mode): OpenAFS 1.3.81 built  2005-05-14 (debian/stable)

windows and linux clients from version 1.2 to 1.4 and for experimantal use 1.5
OpenAFS 1.2.10 built  2005-04-06
OpenAFS 1.3.82 built  2005-08-20
OpenAFS 1.4.2fc4 built  2006-10-02
OpenAFS1.4.0101

Some of fileserver sometimes going to meltdown state (calls waiting
for a thread)
and don't know reason. There is 'rxdebug -rxstats':

Free packets: 935, packet reclaims: 1283, calls: 2197185, used FDs: 64
not waiting for packets.
201 calls waiting for a thread
2 threads are idle
rx stats: free packets 935, allocs 7046769, alloc-failures(rcv
0/0,send 0/0,ack 0)
  greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers
0, selects 0, sendSelects 0
  packets read: data 2220845 ack 3323232 busy 5 abort 5125 ackall 3 challenge 4
response 1098 debug 43944 params 0 unused 0 unused 0 unused 0 version 0
  other read counters: data 2220774, ack 3322547, dup 0 spurious 165 dally 5
  packets sent: data 2851035 ack 54295 busy 592 abort 72 ackall 0 challenge 109
8 response 4 debug 0 params 0 unused 0 unused 0 unused 0 version 0
  other send counters: ack 54295, data 9546732 (not resends), resends 2908, pus
hed 0, ackedignored 3238665
   (these should be small) sendFailed 0, fatalErrors 0
  Average rtt is 0.006, with 745772 samples
  Minimum rtt is 0.000, maximum is 60.235
  518 server connections, 676 client connections, 706 peer structs, 350 call st
ructs, 0 free call structs

Thanks for any answer.

Michal Svamberg
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info