Andreas - Here is a snipet of the strace output.
_llseek(3, 2097152, [2097152], SEEK_SET) = 0
_llseek(3, 2097152, [2097152], SEEK_SET) = 0
_llseek(3, 2097152, [2097152], SEEK_SET) = 0
_llseek(3, 2097152, [2097152], SEEK_SET) = 0
_llseek(3, 2097152, [2097152], SEEK_SET) = 0
_llseek(3, 2097152,
On Wed, 2010-04-14 at 07:08 -0500, Ronald K Long wrote:
Andreas - Here is a snipet of the strace output.
read(3, \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0
\0\0..., 2097152) = 2097152
As Andreas suspected, your application is doing 2MB reads every time.
Does it really
Chris,
I've not upgraded or changed configuration. Running RHEL 4 w/ Lustre
1.6.7.2. An OSS crasshed and some OSTs show a fail to recover on the
MDT but the OSS looks fine, interesting? There are countless pages of
errors - here is a good sample of what I'm seeing.
Apr 11 04:04: 19 gto
Hi,
After an OSS crashed I ran fsck and all but one OST returned quickly
after fixing a few errors. It's been duplicated multiply claimed blocks
for a few days now. Seems it's a very slow and CPU bound operation.
Are there other ways to fix or replace this OST?
I'm running on RHEL 4 w/ latest
Kit,
I thought that it may be a timing issue, but I added mount commands to rc.local
and it didn't help. The odd thing is that it does seem to work on subsequent
reboots. I haven't done extensive testing to see if that works all the time or
not. The other odd thing is that if the FSs don't
Michael Robbert wrote:
Kit,
I thought that it may be a timing issue, but I added mount commands to
rc.local and it didn't help.
Robert,
I'm not sure of the root cause of your mount problems, but we were also
hitting a timing problem when mounting file systems over Infiniband at
boot time.
We've narrowed down the problem quite a bit.
The problematic code snippet is not actually doing any reads or writes;
it's just doing a massive number of fseek() operations within a couple
of nested loops. (Note: The production code is doing some I/O, but this
snippet was narrowed down to the
On Wed, 2010-04-14 at 09:59 -0700, Dan wrote:
Hi,
Hi,
It's been duplicated multiply claimed blocks
for a few days now.
Are you saying that an fsck has been running for a few days, complaining
all that time about multiply claimed blocks? That seems a long time.
How big is the OST? Do the
On 2010-04-14, at 11:08, Ronald K Long wrote:
We've narrowed down the problem quite a bit.
The problematic code snippet is not actually doing any reads or
writes;
it's just doing a massive number of fseek() operations within a couple
of nested loops. (Note: The production code is doing
On 2010-04-14, at 09:59, Dan wrote:
After an OSS crashed I ran fsck and all but one OST returned quickly
after fixing a few errors. It's been duplicated multiply claimed
blocks
for a few days now. Seems it's a very slow and CPU bound operation.
Are there other ways to fix or replace this
Sorry to send it again! Can anyone help?
Jiahua
On Tue, Apr 13, 2010 at 10:45 PM, Jiahua jia...@gmail.com wrote:
Thanks for your answers! More questions:
* Do you only lock for writes? What if I only read? Do you still lock
even for simultaneous reads?
* Is the limitation system wide or
Hey Mike,
That's pretty odd, it looks like the o2ib module has a symbol mismatch
with the ofed driver. I'm surprised it works at all...can you send the
dmesg output after modprobe lustre + mounting, as well as the lctl
list_nids output?
Thanks,
Kit
On 4/14/2010 1:42 PM, Michael Robbert
12 matches
Mail list logo