Re: [Lustre-discuss] fseeks on lustre

2010-04-14 Thread Ronald K Long
Andreas - Here is a snipet of the strace output. _llseek(3, 2097152, [2097152], SEEK_SET) = 0 _llseek(3, 2097152, [2097152], SEEK_SET) = 0 _llseek(3, 2097152, [2097152], SEEK_SET) = 0 _llseek(3, 2097152, [2097152], SEEK_SET) = 0 _llseek(3, 2097152, [2097152], SEEK_SET) = 0 _llseek(3, 2097152,

Re: [Lustre-discuss] fseeks on lustre

2010-04-14 Thread Brian J. Murrell
On Wed, 2010-04-14 at 07:08 -0500, Ronald K Long wrote: Andreas - Here is a snipet of the strace output. read(3, \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0 \0\0..., 2097152) = 2097152 As Andreas suspected, your application is doing 2MB reads every time. Does it really

Re: [Lustre-discuss] Unable to activate inactive OSTs

2010-04-14 Thread Dan
Chris, I've not upgraded or changed configuration. Running RHEL 4 w/ Lustre 1.6.7.2. An OSS crasshed and some OSTs show a fail to recover on the MDT but the OSS looks fine, interesting? There are countless pages of errors - here is a good sample of what I'm seeing. Apr 11 04:04: 19 gto

[Lustre-discuss] Multiply claimed blocks

2010-04-14 Thread Dan
Hi, After an OSS crashed I ran fsck and all but one OST returned quickly after fixing a few errors. It's been duplicated multiply claimed blocks for a few days now. Seems it's a very slow and CPU bound operation. Are there other ways to fix or replace this OST? I'm running on RHEL 4 w/ latest

Re: [Lustre-discuss] Lustre module not loading on client mount

2010-04-14 Thread Michael Robbert
Kit, I thought that it may be a timing issue, but I added mount commands to rc.local and it didn't help. The odd thing is that it does seem to work on subsequent reboots. I haven't done extensive testing to see if that works all the time or not. The other odd thing is that if the FSs don't

Re: [Lustre-discuss] Lustre module not loading on client mount

2010-04-14 Thread Nathan Dauchy
Michael Robbert wrote: Kit, I thought that it may be a timing issue, but I added mount commands to rc.local and it didn't help. Robert, I'm not sure of the root cause of your mount problems, but we were also hitting a timing problem when mounting file systems over Infiniband at boot time.

Re: [Lustre-discuss] fseeks on lustre

2010-04-14 Thread Ronald K Long
We've narrowed down the problem quite a bit. The problematic code snippet is not actually doing any reads or writes; it's just doing a massive number of fseek() operations within a couple of nested loops. (Note: The production code is doing some I/O, but this snippet was narrowed down to the

Re: [Lustre-discuss] Multiply claimed blocks

2010-04-14 Thread Brian J. Murrell
On Wed, 2010-04-14 at 09:59 -0700, Dan wrote: Hi, Hi, It's been duplicated multiply claimed blocks for a few days now. Are you saying that an fsck has been running for a few days, complaining all that time about multiply claimed blocks? That seems a long time. How big is the OST? Do the

Re: [Lustre-discuss] fseeks on lustre

2010-04-14 Thread Andreas Dilger
On 2010-04-14, at 11:08, Ronald K Long wrote: We've narrowed down the problem quite a bit. The problematic code snippet is not actually doing any reads or writes; it's just doing a massive number of fseek() operations within a couple of nested loops. (Note: The production code is doing

Re: [Lustre-discuss] Multiply claimed blocks

2010-04-14 Thread Andreas Dilger
On 2010-04-14, at 09:59, Dan wrote: After an OSS crashed I ran fsck and all but one OST returned quickly after fixing a few errors. It's been duplicated multiply claimed blocks for a few days now. Seems it's a very slow and CPU bound operation. Are there other ways to fix or replace this

Re: [Lustre-discuss] RDMA limitation?

2010-04-14 Thread Jiahua
Sorry to send it again! Can anyone help? Jiahua On Tue, Apr 13, 2010 at 10:45 PM, Jiahua jia...@gmail.com wrote: Thanks for your answers! More questions: * Do you only lock for writes? What if I only read? Do you still lock even for simultaneous reads? * Is the limitation system wide or

Re: [Lustre-discuss] Lustre module not loading on client mount

2010-04-14 Thread Kit Westneat
Hey Mike, That's pretty odd, it looks like the o2ib module has a symbol mismatch with the ofed driver. I'm surprised it works at all...can you send the dmesg output after modprobe lustre + mounting, as well as the lctl list_nids output? Thanks, Kit On 4/14/2010 1:42 PM, Michael Robbert