> -----Original Message----- > From: Paul Gortmaker [mailto:paul.gortma...@gmail.com] > Sent: Wednesday, November 14, 2012 4:23 PM > To: e1000-devel@lists.sourceforge.net > Subject: [E1000-devel] Successful rescue of older E1000 with corrupted > EEPROM > > This is a description of rescuing an older intel e1000 hardware that > had a corrupted EEPROM. Maybe someone else can use the info from this > success to create their own rescue. > > I stumbled across a homeless Dell Precision 650, and since it looked > like an interesting (old) target to use for boot testing stuff on, I > gave it a > temporary(!) home. After putting a common linux distro on it, I got > this: > > -------------------------------------------------- > [ 2.997690] e1000: /*********************/ > [ 2.997697] e1000: Current EEPROM Checksum : 0x1e5e > [ 2.997699] e1000: Calculated : 0x2b5f > [ 2.997702] e1000: Offset Values > [ 2.997704] e1000: ======== ====== > [ 2.997708] 00000000: ff ff 56 16 16 fc 10 0b ff ff ff ff ff ff ff > ff > [ 2.997711] 00000010: 01 00 03 00 0b 46 2c 01 28 10 0f 10 86 80 68 > b0 > [ 2.997714] 00000020: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > ff > [ 2.997717] 00000030: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > ff > [ 2.997719] 00000040: 0c c3 61 78 08 1c 02 21 c8 0c ff ff ff ff ff > ff > [ 2.997722] 00000050: ff ff ff ff ff ff ff ff ff ff ff ff ff ff 00 > 01 > [ 2.997725] 00000060: 64 01 02 40 05 12 ff ff ff ff ff ff ff ff ff > ff > [ 2.997728] 00000070: ff ff ff ff ff ff ff ff ff ff ff ff ff ff 5e > 1e > [ 2.997730] e1000: Include this output when contacting your support > provider. > [ 2.997732] e1000: This is not a software error! Something bad > happened > to > [ 2.997735] e1000: your hardware or EEPROM image. Ignoring this > problem > could > [ 2.997737] e1000: result in further problems, possibly loss of > data, > [ 2.997739] e1000: corruption or system hangs! > [ 2.997741] e1000: The MAC Address will be reset to > 00:00:00:00:00:00, > [ 2.997743] e1000: which is invalid and requires you to set the > proper > MAC > [ 2.997745] e1000: address manually before continuing to enable this > network > [ 2.997748] e1000: device. Please inspect the EEPROM dump and report > the > [ 2.997750] e1000: issue to your hardware vendor or Intel Customer > Support. > [ 2.997752] e1000: /*********************/ > [ 2.997759] e1000 0000:03:0e.0: (unregistered net_device): Invalid > MAC > Address > -------------------------------------------------- > > Great. Driver fail, with a handfull of binary gobbledy-gook. A bit of > digging and it turns out we can get the same data from ethtool on > demand: > > root@crapbox:~# ethtool -e eth2 | head -n 10 > Offset Values > ------ ------ > 0x0000: ff ff 56 16 16 fc 10 0b ff ff ff ff ff ff ff ff > 0x0010: 01 00 03 00 0b 46 2c 01 28 10 0f 10 86 80 68 b0 > 0x0020: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 0x0030: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 0x0040: 0c c3 61 78 08 1c 02 21 c8 0c ff ff ff ff ff ff > 0x0050: ff ff ff ff ff ff ff ff ff ff ff ff ff ff 00 01 > 0x0060: 64 01 02 40 05 12 ff ff ff ff ff ff ff ff ff ff > 0x0070: ff ff ff ff ff ff ff ff ff ff ff ff ff ff 5e 1e > root@crapbox:~# > > Note the 1e5e in the last 2 bytes of the data dump. Same as that > reported by the driver above for current checksum. > > Would be nice to know what the bits-n-bytes are though. Turns out that > it is actually documented: > > http://www.intel.com/design/network/applnots/ap470.htm > > The above takes you to "82546GB/EB and 82545GM/EM Gigabit Ethernet > Controller EEPROM Map and Programming Information Application Note (AP- > 470)". > > With that, I find out that the 1st chunk of EEPROM is for the MAC (no > real surprise there). And that the last two bytes are the values > needed for the whole 0x40 words (0x80 bytes) to checksum to 0xBABA. > > So I test this on another old dell I have nearby: > > ------------------------------------------------------------------ > root@gx270:~# ethtool -e eth1 | head -n 10 > Offset Values > ------ ------ > 0x0000: 00 0f 1f d7 8a f5 10 0b 98 99 ff ff ff ff ff ff > 0x0010: 05 00 01 a0 0b 66 51 01 28 10 0e 10 86 80 20 b0 > 0x0020: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 0x0030: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 0x0040: 04 e3 61 78 07 1b 03 21 c8 0c ff ff ff ff ff ff > 0x0050: ff ff ff ff ff ff ff ff ff ff ff ff ff ff 00 01 > 0x0060: ec 01 02 40 05 12 ff ff ff ff ff ff ff ff ff ff > 0x0070: ff ff ff ff ff ff ff ff ff ff ff ff ff ff 2a e9 > root@gx270:~# cat bc-script > obase=16 > ibase=16 > 0F00+D71F+F58A+0B10+9998+FFFF+FFFF+FFFF+0005+A001+660B+0151+1028+100E+8 > 086+B020+FFFF+FFFF+FFFF+FFFF+FFFF+FFFF+FFFF+FFFF+FFFF+FFFF+FFFF+FFFF+FF > FF+FFFF+FFFF+FFFF+E304+7861+1B07+2103+0CC8+FFFF+FFFF+FFFF+FFFF+FFFF+FFF > F+FFFF+FFFF+FFFF+FFFF+0100+01EC+4002+1205+FFFF+FFFF+FFFF+FFFF+FFFF+FFFF > +FFFF+FFFF+FFFF+FFFF+FFFF+FFFF+E92A > root@gx270:~# bc < bc-script > 30BABA > root@gx270:~# ifconfig eth1|grep eth1 > eth1 Link encap:Ethernet HWaddr 00:0f:1f:d7:8a:f5 > root@gx270:~# > ------------------------------------------------------------------ > > Sure enough, it works. Checksum (ignoring carry) is 0xBABA just like > the in-kernel driver code checks for. And we see the MAC in the 1st > 6 bytes of the dump. > > The Dell GX270 has a 82540EM, where the precision 650 has a 82545EM, so > we expect the EEPROM to be different. > > On the other hand, they are quite similar. After the MAC address, we > see in both 10 0b. Really it is only the leading 0xff 0xff in the > precision 650 that looks rather suspicious as "erased". Taking that > one step further, lets assume that the damage is limited to two bytes. > So we are looking for a Dell MAC that starts with XX:XX:56 maybe. > > Knowing that the 1st three bytes of a MAC are vendor specific, I look > for a list for Dell. Here is one such site: > > http://www.coffer.com/mac_find/?string=Dell > > There are about two dozen, but only one matches the xx:xx:56, that > being "000D56 -- Dell PCBA Test". Lets plug that into our possibly > corrupted EEPROM, with just those two values changed, and see what the > checksum comes out to be: > > ------------------------------------------ > root@crapbox:~# cat bc-script > obase=16 > ibase=16 > 0D00+1656+FC16+0B10+FFFF+FFFF+FFFF+FFFF+0001+0003+460B+012C+1028+100F+8 > 086+B068+FFFF+FFFF+FFFF+FFFF+FFFF+FFFF+FFFF+FFFF+FFFF+FFFF+FFFF+FFFF+FF > FF+FFFF+FFFF+FFFF+C30C+7861+1C08+2102+0CC8+FFFF+FFFF+FFFF+FFFF+FFFF+FFF > F+FFFF+FFFF+FFFF+FFFF+0100+0164+4002+1205+FFFF+FFFF+FFFF+FFFF+FFFF+FFFF > +FFFF+FFFF+FFFF+FFFF+FFFF+FFFF+1E5E > root@crapbox:~# bc < bc-script > 2EBABA > root@crapbox:~# > ------------------------------------------ > > Woot! We've confirmed that replacing the two leading 0xff values with > 0x0d, 0x00 used by "Dell PCBA Test" will make the EEPROM image pass the > checksum test by returning 0xBABA. The corruption is limited to the > 1st two bytes. So now we just need to write those back to the EEPROM. > > Turns out that ethtool can do this too: > > # ethtool -E eth2 magic 0x100f8086 offset 0x0 value 0x00 # ethtool -E > eth2 magic 0x100f8086 offset 0x01 value 0x0D > > Be sure to select the correct device, if you have multiple cards like I > do! The magic value is there for that reason. > The "magic" is just the PCI device ID and vendor ID (lspci -nvv). > > Dumping the contents shows the writes "stuck" and now the driver loads > without any complaints and I have a working gigE interface! > > ----------------- > root@crapbox:~# ethtool -e eth2 | head -n 10 > Offset Values > ------ ------ > 0x0000: 00 0d 56 16 16 fc 10 0b ff ff ff ff ff ff ff ff > 0x0010: 01 00 03 00 0b 46 2c 01 28 10 0f 10 86 80 68 b0 > 0x0020: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 0x0030: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > 0x0040: 0c c3 61 78 08 1c 02 21 c8 0c ff ff ff ff ff ff > 0x0050: ff ff ff ff ff ff ff ff ff ff ff ff ff ff 00 01 > 0x0060: 64 01 02 40 05 12 ff ff ff ff ff ff ff ff ff ff > 0x0070: ff ff ff ff ff ff ff ff ff ff ff ff ff ff 5e 1e > root@crapbox:~# > --------------- > > Obviously, if your EEPROM corruption is more extensive, you may need to > find a similar system that you can "steal" the EEPROM data from. > Knowing the above, you could tweak the MAC (to make it unique) and > tweak the checksum to preserve the magic 0xBABA. (or just re-use the > MAC if the computers are worlds apart!) > > See also this page, which contained useful info: > > http://blog.vodkamelone.de/archives/146-Unbricking-an-Intel-Pro1000- > e1000-network-interface.html > > Good luck in your own rescue attempts!
Hi Paul, Thanks for all the work here. However, please be careful in doing this. If you write the wrong EEPROM image to your corrupted EERPOM you could brick your NIC/LOM worse than it is already. You could get it into such a state that it won't even show up on the PCI bus. I totally understand where you are going with this and while it may help some it could also hurt others who use an incompatible EEPROM image as the "good" source for the EEPROM. In addition, since you are using LOM devices in this example, the maker of the platform that has the LOM on it can have specialized parameters in the EEPROM which might not be set unless the exact EEPROM image is burned to the device. This may enable or disable features the manufacturer wants to have enabled or disabled for various reasons. So if the exact EEPROM image is not used to correct the corrupted EEPROM, there can be problems with doing this. This is just a word of warning. We cannot be responsible for problems incurred from following these instructions. We do appreciate the work you have done here. It may help some people. Cheers, John Ronciak Intel Corp. ------------------------------------------------------------------------------ Monitor your physical, virtual and cloud infrastructure from a single web console. Get in-depth insight into apps, servers, databases, vmware, SAP, cloud infrastructure, etc. Download 30-day Free Trial. Pricing starts from $795 for 25 servers or applications! http://p.sf.net/sfu/zoho_dev2dev_nov _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired