I think that I've discovered the problem is the OFED Roll that I'm using. When
a node is first built it recompiles the OFED modules for the current kernel and
I'm still deciphering the actual sequence of events, but I think that I need to
add a reboot at the end of the process.
Mike
On Apr
Kit,
I thought that it may be a timing issue, but I added mount commands to rc.local
and it didn't help. The odd thing is that it does seem to work on subsequent
reboots. I haven't done extensive testing to see if that works all the time or
not. The other odd thing is that if the FSs don't
Michael Robbert wrote:
Kit,
I thought that it may be a timing issue, but I added mount commands to
rc.local and it didn't help.
Robert,
I'm not sure of the root cause of your mount problems, but we were also
hitting a timing problem when mounting file systems over Infiniband at
boot time.
Hey Mike,
That's pretty odd, it looks like the o2ib module has a symbol mismatch
with the ofed driver. I'm surprised it works at all...can you send the
dmesg output after modprobe lustre + mounting, as well as the lctl
list_nids output?
Thanks,
Kit
On 4/14/2010 1:42 PM, Michael Robbert
Hey Mike,
Are there any messages in dmesg on boot? I've seen it on occasion where
the IB takes a second to actually start. If that's the case, you might
need to add mounts to rc.local, or try to get openibd to start earlier.
- Kit
On 4/12/2010 7:33 PM, Michael Robbert wrote:
I am trying to