Re: [casper] packets lost of a packetized correlator

Homin Jiang Tue, 13 Mar 2018 20:43:53 -0700

Dear Danny and John:

Thanks of your suggestion. I checked the ARP table as below, the unused
ones are all "FF FF ...". Did you suggest assign all the ARP table with
different address ?


best
homin





ARP Table:
IP:  10.  0.  0.  0: MAC: FF FF FF FF FF FF
IP:  10.  0.  0.  1: MAC: FF FF FF FF FF FF
...

IP:  10.  0.  0. 19: MAC: FF FF FF FF FF FF
IP:  10.  0.  0. 20: MAC: 00 60 DD 44 9D 38
IP:  10.  0.  0. 21: MAC: FF FF FF FF FF FF
...
IP:  10.  0.  0.126: MAC: FF FF FF FF FF FF
IP:  10.  0.  0.127: MAC: FF FF FF FF FF FF
IP:  10.  0.  0.128: MAC: 02 02 0A 00 00 80
IP:  10.  0.  0.129: MAC: 02 02 0A 00 00 81
IP:  10.  0.  0.130: MAC: 02 02 0A 00 00 82
IP:  10.  0.  0.131: MAC: 02 02 0A 00 00 83
IP:  10.  0.  0.132: MAC: 02 02 0A 00 00 84
IP:  10.  0.  0.133: MAC: 02 02 0A 00 00 85
IP:  10.  0.  0.134: MAC: 02 02 0A 00 00 86
IP:  10.  0.  0.135: MAC: 02 02 0A 00 00 87
IP:  10.  0.  0.136: MAC: 02 02 0A 00 00 88
IP:  10.  0.  0.137: MAC: 02 02 0A 00 00 89
IP:  10.  0.  0.138: MAC: 02 02 0A 00 00 8A
IP:  10.  0.  0.139: MAC: 02 02 0A 00 00 8B
IP:  10.  0.  0.140: MAC: 02 02 0A 00 00 8C
IP:  10.  0.  0.141: MAC: FF FF FF FF FF FF
IP:  10.  0.  0.142: MAC: 02 02 0A 00 00 8E
IP:  10.  0.  0.143: MAC: 02 02 0A 00 00 8F
IP:  10.  0.  0.144: MAC: 02 02 0A 00 00 90
IP:  10.  0.  0.145: MAC: 02 02 0A 00 00 91
IP:  10.  0.  0.146: MAC: 02 02 0A 00 00 92
IP:  10.  0.  0.147: MAC: 02 02 0A 00 00 93
IP:  10.  0.  0.148: MAC: 02 02 0A 00 00 94
IP:  10.  0.  0.149: MAC: 02 02 0A 00 00 95
IP:  10.  0.  0.150: MAC: 02 02 0A 00 00 96
IP:  10.  0.  0.151: MAC: 02 02 0A 00 00 97
IP:  10.  0.  0.152: MAC: 02 02 0A 00 00 98
IP:  10.  0.  0.153: MAC: 02 02 0A 00 00 99
IP:  10.  0.  0.154: MAC: 02 02 0A 00 00 9A
IP:  10.  0.  0.155: MAC: 02 02 0A 00 00 9B
IP:  10.  0.  0.156: MAC: 02 02 0A 00 00 9C
IP:  10.  0.  0.157: MAC: 02 02 0A 00 00 9D
IP:  10.  0.  0.158: MAC: 02 02 0A 00 00 9E
IP:  10.  0.  0.159: MAC: 02 02 0A 00 00 9F
IP:  10.  0.  0.160: MAC: FF FF FF FF FF FF
...
IP:  10.  0.  0.255: MAC: FF FF FF FF FF FF
------------------------


On Wed, Mar 14, 2018 at 7:21 AM, John Ford <[email protected]> wrote:

> Hi Homin.  I think Danny's suggestion is a good one.  We have had similar
> problems with the system working for a while, then packets getting lost.
> Making sure that the entries in the ARP table are correct (and the yellow
> block MAC addresses are correct) may solve it.  Looking at the switch
> traffic with the monitoring built into it might tell you if this is a
> problem.
>
> John
>
> On Mon, Mar 12, 2018 at 10:54 PM, David MacMahon <[email protected]>
> wrote:
>
>> I think the tx overflow will be OK since the FPGA won't try to send more
>> than 10 Gbps.  I think the "rx overrun" flag would be more interesting.
>> But probably best to check both of course! :)
>>
>> Is the X engine clock an exact copy of the F engine clock (i.e. a common
>> clock that goes through a massive splitter) or just a clock of the same
>> frequency locked to the same reference (but not the exact same clock)?
>> Things get more complicated once you run F and X at different rates, so I
>> wouldn't recommend that path if you can avoid it.
>>
>> HTH,
>> Dave
>>
>>
>> On Mar 12, 2018, at 22:01, Homin Jiang <[email protected]> wrote:
>>
>> Hi Dave:
>>
>> Thanks of prompt response and suggestion.
>> The X engine is running the same clock as the F engine, 2.24GHz/8 =
>> 280MHz. Perhaps I should increase the clock in X engine ?
>> Yes, there is Tx overflow flag in the model, it will be the first thing
>> for me to check.
>>
>> best
>> homin
>>
>>
>>
>> On Tue, Mar 13, 2018 at 12:42 PM, David MacMahon <[email protected]>
>> wrote:
>>
>>> Hi, Homin,
>>>
>>> The first thing to do is figure out where packet loss is actually
>>> happening.  The fact that you have to reset the 10G yellow blocks to get
>>> things going again suggests that the X engines are not keeping up with the
>>> data rate (since the F engines will happily churn out 8.96 Gbps data
>>> regardless of the receivers' states and the X engines will happily churn
>>> out data regardless of the PC's state, it seems that the only way for the
>>> 10 GbE blocks to get confused is if the X engines are not keep up with the
>>> incoming data rate).  I assume the F engine ROACH2s are being clocked via
>>> their ADCs.  How are the X engine ROACH2s being clocked?
>>>
>>> Assuming the F-to-X packets are going through a switch, you could query
>>> the switch to see what it thinks the incoming and outgoing data rates are
>>> on the various ports involved.
>>>
>>> Does your design have any way of capturing the overflow flags of the 10
>>> GbE cores?
>>>
>>> Dave
>>>
>>> On Mar 12, 2018, at 19:39, Homin Jiang <[email protected]>
>>> wrote:
>>>
>>> Dear Casperite:
>>>
>>> We have been deployed a 7(actually 8) antenna packetized correlator on
>>> Mauna Loa Hawaii. Running at 2.24GHz clock, that means 8.96 G bits per
>>> second for each 10G ethernet. The packet size is 2K. There are 8 sets of
>>> ROACH2 as F engines, the other 8 sets of ROACH2 as X engines. Data packets
>>> from F to X looks fine, the problem of lost packets is the integration data
>>> from X engine to the computer. The 10G yellow blocks in X engines handle
>>> the incoming data packets from F engine at the data rate of 8.96 Gbps, and
>>> output the integration data to PC, the outgoing data rate depends on the
>>> integration time, usually it is longer than 0.5 second. The syndrome is
>>> that packets lost happened by specific X engines after 10,20 minutes or
>>> couple of hours. Once it happened, we reset all the 10G yellow blocks in F
>>> and X, then the system revived.
>>>
>>> I have no idea about the 10G ethernet yellow block. Any comments of
>>> suggestions are highly welcome.
>>>
>>> best
>>> homin jiang
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "[email protected]" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "[email protected]" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "[email protected]" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "[email protected]" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>>
>
> --
> You received this message because you are subscribed to the Google Groups "
> [email protected]" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
>

-- 
You received this message because you are subscribed to the Google Groups 
"[email protected]" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].

Re: [casper] packets lost of a packetized correlator

Reply via email to