On Tue, 2009-02-17 at 08:31 +0800, Wen Hao Wang wrote: > OK, Doug: > > Thanks a lot for your detailed explanation! So if I donot want to > reboot the machine, I need run "chkconfig", "kudzu" and "openibd > start".
Correct. > Wen Hao Wang > Email: [email protected] > > > Doug Ledford <[email protected]> wrote on 2009-02-17 01:49:33: > > > On Mon, 2009-02-16 at 09:29 +0800, Wen Hao Wang wrote: > > > > > > Wen Hao Wang > > > > > > Software Engineer > > > IBM China Software Development Laboratory > > > Email: [email protected] > > > Tel: 86-10-82451055 > > > Fax: 86-10-82782244 ext. 2312 > > > Address: 1/F, IBM ZGC Campus. Ring Building 28,ZhongGuanCun > Software > > > Park,No.8 Dong Bei Wang West Road, Haidian District Beijing, > 100193, > > > P.R.China > > > > > > > > > Doug Ledford <[email protected]> 写于 2009-02-14 00:13:32: > > > > > > > On Fri, 2009-02-13 at 08:05 +0800, Wen Hao Wang wrote: > > > > > Doug Ledford <[email protected]> 写于 2009-02-12 21:20:30: > > > > > > > > > > > On Thu, 2009-02-12 at 13:20 +0200, Tziporet Koren wrote: > > > > > > > Wen Hao Wang wrote: > > > > > > > > > > > > > > > > Hi all: > > > > > > > > > > > > > > > > I changed my blade OS to RHEL5.3 yesterday and installed > > > OFED > > > > > (shipped > > > > > > > > in RHEL5.3 image) by "yum groupisntall". Then I load > some > > > > > drivers and > > > > > > > > wrote network interface configuration file ifcfg-ib0. > ifup > > > ib0 > > > > > also > > > > > > > > succeeded. But IB utilites report Connetion timed out. > > > > > > > > > > > > > > > > > > > > > > > > [r...@xblade06 network-scripts]# sminfo > > > > > > > > ibwarn: [32593] _do_madrpc: recv failed: Connection > timed > > > out > > > > > > > > ibwarn: [32593] mad_rpc: _do_madrpc failed; dport (Lid > 9) > > > > > > > > sminfo: iberror: failed: query > > > > > > > > > > > > > > > > I had to reboot the blade and rerun "openibd start". > Then > > > > > sminfo > > > > > > > > reported correct contents. I do not suppose this reboot > is > > > > > required. > > > > > > > > Did I miss any configuration step? > > > > > > > > > > > > There was an unintentional bug in the rhel5.2 openibd init > > > script in > > > > > > that it automatically turned itself on during install > > > (generally, > > > > > most > > > > > > init scripts should default to *not* turning themselves on > > > during > > > > > > install of the package, nor should they start themselves > during > > > > > install > > > > > > of the package...this is for security reasons, imagine if > you > > > > > installed > > > > > > the bind name server on your box and it automatically > started up > > > > > before > > > > > > you had a chance to configure it). In rhel5.3 we fixed that > > > bug. > > > > > So, > > > > > > > > > > Yeah. I heard of this bug. > > > > > > > > > > > you may need to 'chkconfig --level 2345 openibd on' to make > sure > > > > > openibd > > > > > > starts up each time. The error you list above is consistent > > > with > > > > > not > > > > > > all of the kernel modules being loaded when you tried to use > the > > > > > sminfo > > > > > > program. > > > > > > > > > > Even after reboot, service openibd is not started > automatically. > > > > > [r...@xblade06 ~]# chkconfig --list openibd > > > > > openibd 0:off 1:off 2:off 3:off 4:off 5:off > > > 6:off > > > > > > > > That's because you have to run the command I listed in my first > > > email to > > > > turn it on. > > > > > > > > > > I totally agree with this. But I am still confused why sminfo gave > > > errors > > > before reboot, or which steps I should take for the first OFED > usage > > > before > > > reboot. As far as I can see, whether the service is added into > system > > > runlevel DB is not related to the sminfo error. Please correct me > if > > > that > > > is not the case. > > > > It is related. The runlevel db is only consulted on boot up. If > the > > openibd service was not enabled at startup, then adding it to the > > runlevel startup does *not* start it at that time. You have to both > add > > it to the runlevel startup and also start it manually if you want > things > > to work properly prior to reboot. The sminfo errors you first > posted > > are consistent with some of the modules not being loaded, and it > went > > away after you started the openibd service, which is also consistent > > with the problem. > > > > > > > I agree with you that maybe some modules were not loaded. But > > > what's > > > > > that? > > > > > Before reboot, I run "/etc/init.d/openibd start" and > > > > > "/etc/init.d/network > > > > > restart". No error was reported. "openibd status" also looked > > > good. > > > > > > > > Running start on a service does not enable that service at the > next > > > > reboot. You must specifically enable the service in order for > it to > > > > start automatically. > > > > > > > > > > > > > > > > > > Moreover, "openibd start" report one warning message > about > > > > > hwconf. > > > > > > > > Anyone has comments about this? > > > > > > > > > > > > > > > > [r...@xblade07 ~]# /etc/init.d/openibd start > > > > > > > > Loading OpenIB kernel > modules:grep: /etc/sysconfig/hwconf: > > > No > > > > > such > > > > > > > > file or directory > > > > > > > > [ OK ] > > > > > > > > > > > > Can you see if the kudzu package is installed on your > machine? > > > The > > > > > > openib package uses this config file written by kudzu to > > > determine > > > > > what > > > > > > hardware drivers to load. I suppose I should put a specific > > > > > requires in > > > > > > the rpm for that. > > > > > > > > > > kudzu is installed. > > > > > [r...@xblade06 ~]# rpm -q kudzu > > > > > kudzu-1.2.57.1.21-1 > > > > > > > > Make sure kudzu has been run at least once then (it would appear > to > > > be > > > > turned off on your machine or else /etc/sysconfig/hwconf would > > > exist). > > > > You can run it manually from the command line and that should be > > > > sufficient for the openibd init script's needs. > > > > > > > > > > Yes. After kudza created the file on my machine, openibd script > had no > > > error > > > this time. I want to know in my scenario, is "openibd restart" > > > needed/required? > > > > It would probably be advisable, but only if you haven't rebooted > since > > running kudzu for the first time. If you've rebooted since then, > then > > it doesn't matter. > > > > > Many thanks! > > > > > > Wen Hao Wang > > > Email: [email protected] > > > > > > > -- > > > > Doug Ledford <[email protected]> > > > > GPG KeyID: CFBFF194 > > > > http://people.redhat.com/dledford > > > > > > > > Infiniband specific RPMs available at > > > > http://people.redhat.com/dledford/Infiniband > > > > > > > > [附件 "signature.asc" 被 Wen Hao Wang/China/IBM 删除] > > > > > -- > > Doug Ledford <[email protected]> > > GPG KeyID: CFBFF194 > > http://people.redhat.com/dledford > > > > Infiniband specific RPMs available at > > http://people.redhat.com/dledford/Infiniband > > > > [附件 "signature.asc" 被 Wen Hao Wang/China/IBM 删除] > -- Doug Ledford <[email protected]> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband
signature.asc
Description: This is a digitally signed message part
_______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
