On 29/09/16(Thu) 16:20, Joseph Zatarski wrote: > Yesterday, I was attempting to install 5.8 on a VAX architecture (a DEC > VXT2000 with 18MB RAM and SPX graphics to be specific) by net booting using > the VAX boot.mop file. I was having issues during the process, and I > eventually used a network sniffer attempting to troubleshoot issues. I > noticed that loading over NFS seemed to stop shortly after my VXT received > an ARP request from my laptop, and also that the ARP reply seemed to be of > the wrong length. This issue was also a highly intermittent issue, as well > as variable in the ways it seemed to fail. Sometimes a full reboot would be > required before an NFS download worked again, other times I could retry > without rebooting, and rarely I would be able to boot without issues. > > I know VAX is no longer supported, but I discovered issues in > arch-independent sections of the source, so please keep reading. > > I did find, I think, the source of the issue with ARP reply packet length. > In libsa, the function arp_reply is what handles arp replies. Some comments > in this section read: > > /* > * Convert an ARP request into a reply and send it. > * Notes: Re-uses buffer. Pad to length = 46. > */ > > The length I see in the network sniffer is actually 64 bytes, which is 18 > bytes longer than the 46 mentioned here. A normal ARP reply or request > (including ethernet frame) is actually 46 bytes, so this appears to be > correct in that the length should ultimately be 46. This function accepts a > pointer to an ethernet packet of type void, and casts it to a structure > devised to hold an ARP request or reply, called ether_arp: > > struct ether_arp *arp = pkt; > > This structure should be the size of the ethernet frame's 'payload' for an > ARP packet, which comes out to be 28 bytes. This is 18 less than 46. The > additional 18 is for the layer 2 ethernet overhead. > > Near the end of the function, we find a call to sendether which 'pads' with > 18 bytes: > > (void) sendether(d, pkt, sizeof(*arp) + 18, > arp->arp_tha, ETHERTYPE_ARP); > > However, this is where we go wrong. In ether.c, we find the function > sendether. Sendether has the following notes: > > /* Caller must leave room for ethernet header in front!! */ > > And later: > > eh = (struct ether_header *)pkt - 1; > len += sizeof(*eh); > > which is actually what does the padding here, meaning the '+ 18' in the > call to sendether is redundant and wrong. > > Looking at this reveals another potential issue. arp_reply passes *pkt to > sendether, and sendether will overwrite some number of bytes before the > buffer, which could be out the bounds of that buffer. Of course, it is also > possible that the buffer that *pkt points inside actually has enough space > within it before *pkt that this is a nonissue, but I need to investigate > this further (starting with finding the caller of arp_reply) to determine > whether it's an issue or not. Although weak evidence, data corruption > caused by ARP reply would seem to explain the issues I had with netbooting > my VXT2000. > > To conclude, I believe that line 291 in /sys/lib/libsa/arp.c should be > changed from: > > (void) sendether(d, pkt, sizeof(*arp) + 18, > > to > > (void) sendether(d, pkt, sizeof(*arp), > > in order to solve the packet length issue. Aside from this, I will continue > looking into potential issues with exceeding buffer bounds when sendether > is called from arp_reply.
Thanks for the analyse, did you try what you suggested? On which archs? Do you have a diff to share?
