Hi Homin, Could this be due to a rouge ARP process? We have seen failure modes where the ARP configuration fills itself with FF:FF:FF:FF and starts broadcasting UDP traffic. We hard-code the ARP table to stop this.
I also recall reading something similar to do with anti-flood contorl on some switches, might be worth double-checking if there's an unusual 'feature' that turns on after a metric is reached. Good luck debugging! Regards, Danny On 13 March 2018 at 4:01:13 pm, Homin Jiang ([email protected]) wrote: Hi Dave: Thanks of prompt response and suggestion. The X engine is running the same clock as the F engine, 2.24GHz/8 = 280MHz. Perhaps I should increase the clock in X engine ? Yes, there is Tx overflow flag in the model, it will be the first thing for me to check. best homin On Tue, Mar 13, 2018 at 12:42 PM, David MacMahon <[email protected]> wrote: > Hi, Homin, > > The first thing to do is figure out where packet loss is actually > happening. The fact that you have to reset the 10G yellow blocks to get > things going again suggests that the X engines are not keeping up with the > data rate (since the F engines will happily churn out 8.96 Gbps data > regardless of the receivers' states and the X engines will happily churn > out data regardless of the PC's state, it seems that the only way for the > 10 GbE blocks to get confused is if the X engines are not keep up with the > incoming data rate). I assume the F engine ROACH2s are being clocked via > their ADCs. How are the X engine ROACH2s being clocked? > > Assuming the F-to-X packets are going through a switch, you could query > the switch to see what it thinks the incoming and outgoing data rates are > on the various ports involved. > > Does your design have any way of capturing the overflow flags of the 10 > GbE cores? > > Dave > > On Mar 12, 2018, at 19:39, Homin Jiang <[email protected]> wrote: > > Dear Casperite: > > We have been deployed a 7(actually 8) antenna packetized correlator on > Mauna Loa Hawaii. Running at 2.24GHz clock, that means 8.96 G bits per > second for each 10G ethernet. The packet size is 2K. There are 8 sets of > ROACH2 as F engines, the other 8 sets of ROACH2 as X engines. Data packets > from F to X looks fine, the problem of lost packets is the integration data > from X engine to the computer. The 10G yellow blocks in X engines handle > the incoming data packets from F engine at the data rate of 8.96 Gbps, and > output the integration data to PC, the outgoing data rate depends on the > integration time, usually it is longer than 0.5 second. The syndrome is > that packets lost happened by specific X engines after 10,20 minutes or > couple of hours. Once it happened, we reset all the 10G yellow blocks in F > and X, then the system revived. > > I have no idea about the 10G ethernet yellow block. Any comments of > suggestions are highly welcome. > > best > homin jiang > > > -- > You received this message because you are subscribed to the Google Groups " > [email protected]" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > > > -- > You received this message because you are subscribed to the Google Groups " > [email protected]" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > -- You received this message because you are subscribed to the Google Groups " [email protected]" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. -- You received this message because you are subscribed to the Google Groups "[email protected]" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected].

