Paolo Supino  wrote / napísal(a):


On Wed, Sep 3, 2008 at 5:52 PM, Marco Fretz <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:

    hi,

    we had the same problem with newer HP pcs and servers (broadcom nics).
    pxe works well on broadcom, the install not. doesn't matter if you're
    using kickstart or manual install.

    the problem was in centos 4.2. after updating the install
    environment to
    4.5 the problem was gone... so it was a driver issue! the install
    kernel
    is not exactly the normal linux kernel i think.

    if anaconda just says that it cannot find install image, etc. the
    system
    has no connectivity at this time.

    hope this is helpful...

    bests
     marco

    Paolo Supino wrote:
    >
    >
    > On Tue, Sep 2, 2008 at 3:07 PM, Romeo Ninov <[EMAIL PROTECTED]
    <mailto:[EMAIL PROTECTED]>
    > <mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>> wrote:
    >
    >
    >
    >     Paolo Supino  wrote / napísal(a):
    >
    >
    >
    >         On Tue, Sep 2, 2008 at 2:17 PM, Romeo Ninov
    <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
    >         <mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>
    <mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
    >         <mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>>>> wrote:
    >
    >
    >
    >            Paolo Supino  wrote / napísal(a):
    >
    >
    >
    >                On Tue, Sep 2, 2008 at 8:14 AM, nate
    >         <[EMAIL PROTECTED]
    <mailto:[EMAIL PROTECTED]> <mailto:[EMAIL PROTECTED]
    <mailto:[EMAIL PROTECTED]>>
    >                <mailto:[EMAIL PROTECTED]
    <mailto:[EMAIL PROTECTED]>
    >         <mailto:[EMAIL PROTECTED]
    <mailto:[EMAIL PROTECTED]>>>
    >                <mailto:[EMAIL PROTECTED]
    <mailto:[EMAIL PROTECTED]>
    >         <mailto:[EMAIL PROTECTED]
    <mailto:[EMAIL PROTECTED]>>
    >
    >                <mailto:[EMAIL PROTECTED]
    <mailto:[EMAIL PROTECTED]>
    >         <mailto:[EMAIL PROTECTED]
    <mailto:[EMAIL PROTECTED]>>>>> wrote:
    >
    >                   Paolo Supino wrote:
    >                   > Hi Nate
    >                   >
    >
    >                   > 3: After the error comes up I get the HTTP setup
    >                configuration
    >                   screen with
    >                   > the source website (in IP) and CentOS
    directory as I
    >         entered
    >                   them in the
    >                   > pxeconfiguration file and as it appears in
    the kickstart
    >                   configuration file
    >                   > and all I have to do is press the 'OK' button to
    >         continue the
    >                   installation
    >                   > to a successful completion.
    >
    >                   If that's the case the next most likely culprit is
    >
    >                   > url --url http://192.168.11.1/source
    >
    >
    >                   Just because the PXE boot loader can download the
    >         kickstart
    >                   config does not mean that the installation process
    >         will work
    >                   with that NIC.
    >
    >                   Also I've had lots of broadcom systems not
    work with
    >                kickstart over
    >                   the years, it's not uncommon for newer systems
    to have
    >         newer
    >                   revs of the chipsets and those revs not being
    >         supported by the
    >                   installer.
    >
    >                   But it sounds like in your case it does work, so I
    >         would look
    >                   at the url above, as it likely is the cause of the
    >         problem.
    >                Check
    >                   the http access logs on the server for 404s and
    >         similar errors.
    >
    >                   nate
    >
    >                   _______________________________________________
    >                   CentOS mailing list
    >                   CentOS@centos.org <mailto:CentOS@centos.org>
    <mailto:CentOS@centos.org <mailto:CentOS@centos.org>>
    >         <mailto:CentOS@centos.org <mailto:CentOS@centos.org>
    <mailto:CentOS@centos.org <mailto:CentOS@centos.org>>>
    >                <mailto:CentOS@centos.org
    <mailto:CentOS@centos.org> <mailto:CentOS@centos.org
    <mailto:CentOS@centos.org>>
    >         <mailto:CentOS@centos.org <mailto:CentOS@centos.org>
    <mailto:CentOS@centos.org <mailto:CentOS@centos.org>>>>
    >
    >
    >                   http://lists.centos.org/mailman/listinfo/centos
    >
    >
    >
    >                Hi Nate
    >
    >                 After figuring what I was doing wrong (see
    previous reply
    >                ...) I started going through each of my systems
    in order to
    >                boot them and install CentOS 5.2 on each. For the
    most
    >         part it
    >                works, but only for the most part? Because once
    in a few
    >         boots
    >                (not machine specific) anaconda stops and either
    asks me what
    >                interface it needs to configure or fails to load
    'stage2.img'
    >                from the web server on 192.168.11.1
    <http://192.168.11.1> <http://192.168.11.1>
    >         <http://192.168.11.1>
    >                <http://192.168.11.1> ... All cables are good
    cables. The
    >                network switch is a Cisco 3750G with no
    configuration)
    >         and all
    >                the NICs are broadcom with firmware 3.8.9.
    <http://3.8.9.>
    >         <http://3.8.9.> <http://3.8.9.>
    >                <http://3.8.9.> Can you throw a guess where the
    problem might
    >                be lying (I hate inconsistencies)?
    >
    >
    >            Have you check apache logs for something. Check also
    the server
    >            messages
    >
    >            _______________________________________________
    >            CentOS mailing list
    >            CentOS@centos.org <mailto:CentOS@centos.org>
    <mailto:CentOS@centos.org <mailto:CentOS@centos.org>>
    >         <mailto:CentOS@centos.org <mailto:CentOS@centos.org>
    <mailto:CentOS@centos.org <mailto:CentOS@centos.org>>>
    >            http://lists.centos.org/mailman/listinfo/centos
    >
    >
    >         Hi Romeo
    >
    >          Yes I did, and nothing shows up in either access_log or
    >         error_log :-(
    >         I just had a node that stopped asking me for IP
    configuration
    >         (twice) and only on the second time (checked on the
    server using
    >         tcpdump) did it actually try to contact the server to
    retrieve
    >         network configuration continue and it successfully retrieved
    >         'stage2.img' from the web server :-(
    >
    >     Paolo, what about DHCP or bootp servers. Check the logs,
    flush ARP
    >     cache from server(s)
    >
    >     _______________________________________________
    >     CentOS mailing list
    >     CentOS@centos.org <mailto:CentOS@centos.org>
    <mailto:CentOS@centos.org <mailto:CentOS@centos.org>>
    >     http://lists.centos.org/mailman/listinfo/centos
    >
    >
    > Hi Romeo
    >
    >   The more systems I boot the more I'm starting to feel that it's
    > hardware problem related ... I just booted a system in which the
    ELOM
    > says that NIC0 has 1 MAC address, but when I boot the system I
    saw on
    > the network a different MAC address altogether ...
    >   I'm checking at the lowest level: on the wire (using tcpdump)
    so if
    > nothing shows in the capture I'm sure I won't find anything in
    the logs :-(
    >
    >
    >
    >
    > --
    > TIA
    > Paolo
    >
    >
    >
    ------------------------------------------------------------------------
    >
    > _______________________________________________
    > CentOS mailing list
    > CentOS@centos.org <mailto:CentOS@centos.org>
    > http://lists.centos.org/mailman/listinfo/centos
    _______________________________________________
    CentOS mailing list
    CentOS@centos.org <mailto:CentOS@centos.org>
    http://lists.centos.org/mailman/listinfo/centos



Hi Marco

Thanx for the email. I've been debugging this problem for a few days and a few installs before I posted the first email in this thread I started sniffing the network interface on the server (dhcp, tftp, http are all on the same computer) and I noticed that no communication reaches the server between the PXE load and the retrieval error (and I think I wrote about it in my original post). Some people suggested that it might be that Linux gets confused in the interfaces (the Sun X2200 M2 has 4 NICs), which I find hard to believe (Linux kernel is old enough and probably got rid of these kind of bugs a long time ago). In some of the failures the kernel loaded, retrieved the kickstart configuration file and than failed to retrieve 'stage2.img' (again nothing appeared on the wire). I have a sneaky feeling that the kickstart process assumes a lot of basic facts and doesn't do any/enough sanity checking. Right now I need to get this cluster up and running (I'm already 2 weeks behind schedule). After it's up I will try to debug the process. The situation got me so aggravated that I was contemplating resurrecting my old private distro (not going to do that) that does things in a much simpler way.


Paolo
Unfortunately CentOS/RHEL have really problem in process of loading modules, especialy in case of two identical NICs, they change on random way. I personaly use this way to mitigate the problem: in /etc/modprobe.conf add 1st modprobe for NIC on 1st place and second on last place in the file and after reboot i have always NIC->eth? relation in place
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Reply via email to