Hi Richard,

I've just had a very brief look at the design / software, so take this
email with a pinch of salt, but on the off-chance you haven't checked
this....

It looks like the PAPER F-engine setup on running the start script for
software / firmware out of the box is --

1. Disable all ethernet interfaces
2. Arm sync generator, wait 1 second for PPS
3. Reset ethernet interfaces
4. Enable interfaces.

These four steps seem like they should be safe, yet the behaviour
you're describing sounds like the design is midway sending a packet,
then gets a sync, gives up sending an end-of-frame and starts sending
a new packet, at which point the old packet + the new packet =
overflow.

Knowing that the design works for paper, my wondering is whether after
arming the sync generator syncs are flowing through the design before
the ethernet interface is enabled. Do you have a PPS-like input? the
fengine initialisation script seems to wait for a second after arming,
but if your sync input is something significantly slower, you could
have problems.

I'm sceptical about this theory (I think the symptoms would be lots of
OK packets when you brought up the interface, and then it dying when
the sync arrives, rather than a single good packet like you're
seeing), but if the firmware + software really is the same as that
working with paper, and the wiki hasn't just got out of sync with the
paper devs, perhaps the problem is in your hardware setup....

Cheers,
Jack

On 27 October 2014 16:38, Richard Black <aeldstes...@gmail.com> wrote:
> By "enable" port, I assume you mean the "valid" port. I've been looking at
> the PAPER model carefully for some time now, and that is how it operates. It
> has a gated valid signal with a software register on each 10-GbE core.
>
> Once again, this is not our model. This is one made available on the CASPER
> wiki and run without modifications.
>
> Richard Black
>
> On Mon, Oct 27, 2014 at 10:34 AM, Jason Manley <jman...@ska.ac.za> wrote:
>>
>> I suspect the 10GbE core's input FIFO is overflowing on startup. One key
>> thing with this core is to the ensure that your design keeps the enable port
>> held low until the core's been configured. The core becomes unusable once
>> the TX FIFO overflows. This has been a long-standing bug (my emails trace
>> back to 2009) but it's so easy to work around that I don't think anyone's
>> bothered looking into fixing it.
>>
>> Jason Manley
>> CBF Manager
>> SKA-SA
>>
>> Cell: +27 82 662 7726
>> Work: +27 21 506 7300
>>
>> On 27 Oct 2014, at 18:25, Richard Black <aeldstes...@gmail.com> wrote:
>>
>> > Jason,
>> >
>> > Thanks for your comments. While I agree that changing the ADC frequency
>> > mid-operation is non-kosher and could result in uncertain behavior, the
>> > issue at hand for us is to figure out what is going on with the PAPER model
>> > that has been published on the CASPER wiki. This naturally won't be (and
>> > shouldn't be) the end-all solution to this problem.
>> >
>> > This is a reportedly fully-functional model that shouldn't require any
>> > major changes in order to operate. However, this has clearly not been the
>> > case in at least two independent situations (us and Peter). This begs the
>> > question: what's so different about our use of PAPER?
>> >
>> > We, at BYU, have made painstakingly sure that our IP addressing schemes,
>> > switch ports, and scripts are all configured correctly (thanks to David
>> > MacMahon for that, btw), but we still have hit the proverbial brick wall of
>> > 10-GbE overflow.  When I last corresponded with David, he explained that he
>> > remembers having a similar issue before, but can't recall exactly what the
>> > problem was.
>> >
>> > In any case, the fact that by turning down the ADC clock prior to
>> > start-up prevents the 10-GbE core from overflowing is a major lead for us 
>> > at
>> > BYU (we've been spinning our wheels on this issue for several months now).
>> > By no means are we proposing mid-run ADC clock modifications, but this
>> > appears to be a very subtle (and quite sinister, in my opinion) bug.
>> >
>> > Any thoughts as to what might be going on?
>> >
>> > Richard Black
>> >
>> > On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley <jman...@ska.ac.za> wrote:
>> > Just a note that I don't recommend you adjust FPGA clock frequencies
>> > while it's operating. In theory, you should do a global reset in case the
>> > PLL/DLLs lose lock during clock transitions, in which case the logic could
>> > be in a uncertain state. But the Sysgen flow just does a single POR.
>> >
>> > A better solution might be to keep the 10GbE cores turned off (enable
>> > line pulled low) on initialisation, until things are configured (tgtap
>> > started etc), and only then enable the transmission using a SW register.
>> >
>> > Jason Manley
>> > CBF Manager
>> > SKA-SA
>> >
>> > Cell: +27 82 662 7726
>> > Work: +27 21 506 7300
>> >
>> > On 25 Oct 2014, at 10:34, peter <peterniu...@163.com> wrote:
>> >
>> > > Hi Richard,Joe,& all,
>> > > Thanks for your help,It finally can receive packets now!
>> > > As you point,After enabled the ADC card and run bof file(./adc_init.rb
>> > > roach1 bof file)in 200 Mhz (or higher than it), We need run init fengien
>> > > script in about 75 Mhz ,(./paper_feng_init.rb roach1:0 ) ,That will allow
>> > > the packet transfer.  then we can turn the frequency  higher.However the
>> > > finally ADC clock frequency is up to 120 Mhz in my experiment.Our final 
>> > > ADC
>> > > frequency standard is 250 Mhz. Maybe I need run the bof file in a higher 
>> > > ADC
>> > > frequency first to make a final steady 250 Mhz ADC clock frequncy.
>> > > Why it need init in a lower frequency and turn it up? That didn't make
>> > > sense.Is the hardware going wrong?As the yellow block adc16*250-8 is
>> > > designed for 250 Mhz, it should be ok for 200Mhz or 250 Mhz.How about the
>> > > final frequency in your experiment?
>> > > Any reply will be helpful!
>> > > Best Regards!
>> > > peter
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > At 2014-10-25 00:36:52, "Richard Black" <aeldstes...@gmail.com> wrote:
>> > > Peter,
>> > >
>> > > That's correct. We downloaded the FPGA firmware and programmed the
>> > > ROACH with the precompiled bitstream. When we didn't get any data beyond
>> > > that single packet, we stuck some overflow status registers in the model 
>> > > and
>> > > found that we were overflowing at 1025 64-bit words (i.e. 8200 bytes).
>> > >
>> > > We have actually found a way to get packets to flow, but it isn't a
>> > > good fix. When we turn the ADC clock frequency down to about 75 MHz, the
>> > > packets begin to flow. There is an opinion in our group that the 10-GbE
>> > > buffer overflow is a transient behavior, and, hence, if we slowly turn up
>> > > the clock frequency after the ROACH has started up, packets may continue 
>> > > to
>> > > flow in steady-state operation. We haven't tested this yet, though.
>> > >
>> > > Richard Black
>> > >
>> > > On Thu, Oct 23, 2014 at 8:39 PM, peter <peterniu...@163.com> wrote:
>> > > Hi Richard,& All,
>> > > As you said the size of isolate packet is changing every time. ) :
>> > > tcpdump: verbose output suppressed, use -v or -vv for full protocol
>> > > decode
>> > > listening on px1-2, link-type EN10MB (Ethernet), capture size 65535
>> > > bytes
>> > > 10:10:55.622053 IP 10.10.2.1.8511 > 10.10.2.9.8511: UDP, length 4616
>> > > Ddi you download the PAPER gateware on the casper
>> > > (https://casper.berkeley.edu/wiki/PAPER_Correlator_Manifest ) directly? 
>> > > How
>> > > about the PAPER bof file run on your system? Have you met overflow 
>> > > before?I
>> > > download and install  PAPER model as the website says ,but the overflow
>> > > shows when I run the paper_feng_netstat.rb.
>> > > Thanks for your information.
>> > > peter
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > At 2014-10-24 09:59:12, "Richard Black" <aeldstes...@gmail.com> wrote:
>> > > Peter,
>> > >
>> > > I don't mean to hijack your thread, but we've been having a very
>> > > similar (and time-absorbing) issue with the PAPER f-engine FPGA firmware
>> > > here at BYU. Out of curiosity, does this single packet that you're 
>> > > receiving
>> > > in tcpdump change in size every time you reprogram the ROACH? We've seen
>> > > this happen, and we're pretty sure that this isolated packet is the 
>> > > 10-GbE
>> > > buffer flushing when the 10-GbE core is initialized (i.e. the enable 
>> > > signal
>> > > isn't sync'd with the start of new packet).
>> > >
>> > > Regardless of whether we have the same issue, I'm very interested to
>> > > see this problem's resolution.
>> > >
>> > > Good luck,
>> > >
>> > > Richard Black
>> > >
>> > > On Thu, Oct 23, 2014 at 7:50 PM, peter <peterniu...@163.com> wrote:
>> > > Hi Joe, & All,
>> > > I find a thing this morning , there is one packet send out from roach
>> > > When I run PAPER model, which I got from HPC tcpdump:
>> > > tcpdump: verbose output suppressed, use -v or -vv for full protocol
>> > > decode
>> > > listening on px1-2, link-type EN10MB (Ethernet), capture size 65535
>> > > bytes
>> > > 09:04:02.757813 IP 10.10.2.1.8511 > 10.10.2.9.8511: UDP, length 6456
>> > >
>> > > The lenght is not expected 8200+8 ,and far from full TX buffer size
>> > > 8K+512.And the other packets are stopped from overflow.
>> > > I have tried to change the tutorial 2 packet size to 8200 bytes and 8K
>> > > +512 bytes. It is  a good transfer.I also make sure the boundary size is
>> > > indeed 8K+512 ,because while I change size to 8K+513 byetes ,There is no
>> > > data send.So the received packet this morning with length 6456  is 
>> > > totally
>> > > under the limit.But what caused the other packets  in overflow?
>> > > Any suggestions could be helpful !
>> > > peter
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > At 2014-10-24 00:37:14, "Kujawski, Joseph" <jkujaw...@siena.edu>
>> > > wrote:
>> > > Peter,
>> > >
>> > > By cadence of the broadcast, I mean how often are the 8200 byte
>> > > packets sent.  Basically, I would like to determine how close your 
>> > > system is
>> > > to the maximum data rate of the 10Gbe.
>> > >
>> > > Also, it would be instructive to know the following:
>> > >
>> > > 1) What transmission protocol are you using? (the One_GBe module uses
>> > > UDP are you using that or TCP?)
>> > >
>> > > 2) What NICs are you using on the receive side?
>> > >
>> > > At this time, I am working on the theory that the issue is related to
>> > > the network itself not being able to sustain the data volume you are
>> > > generating and would like to get a better idea of how much data is 
>> > > generated
>> > > and how often it is sent.
>> > >
>> > > Thanks,
>> > >
>> > > -Joe Kujawski
>> > >
>> > >
>> > >
>> > > On Thu, Oct 23, 2014 at 12:01 PM, peter <peterniu...@163.com> wrote:
>> > > hi Joe,
>> > > 1,yes ,acctually we have 3 roach2 with 8 nics.
>> > > 2,well,each roach has 4 of 8 NICs connect directly to pc.the other 4
>> > > connect 10gb switch.I have connected the sfp wire( whitch should connect
>> > > switch)  to pc directly to see whwther the data come out.but no data out 
>> > > as
>> > > for the overflow.
>> > > 3 could you make an example about the cadence broadcast?I am not
>> > > familiar with this.
>> > > it indeed require bigger data,but each packet has the limited 8200
>> > > bytes.
>> > > thanks for your reply!
>> > > peter
>> > > --
>> > > 发自 Android 网易邮箱
>> > >
>> > >
>> > >
>> > > On 2014-10-23 23:16 , Kujawski, Joseph Wrote:
>> > >
>> > > Peter,
>> > >
>> > > I am downloading it now.  Can you answer these questions:
>> > >
>> > > 1) Do you have a standard PAPER architecture with two ROACH boards
>> > > each containing 8 10GBe ports?
>> > >
>> > > 2) Please describe your internet architecture i.e. how are each of the
>> > > ports connected.
>> > >
>> > > 3) What is the cadence of each broadcast?
>> > >
>> > > My current suspicion is that you are generating more data than you can
>> > > push through your interface(s).  It may be that the higher data volume in
>> > > your implementation requires more of a network infrastructure than was
>> > > required byt the original system.
>> > >
>> > > -Joe Kujawski
>> > >
>> > > On Thu, Oct 23, 2014 at 11:01 AM, peter <peterniu...@163.com> wrote:
>> > > This is a littel big, roach2_tl8511port is the one can send data
>> > > normally.The environment should be ok now ,Iast time the crc32x64_con 
>> > > may be
>> > > missing.
>> > > Good night!
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > At 2014-10-23 22:52:54, "Kujawski, Joseph" <jkujaw...@siena.edu>
>> > > wrote:
>> > > Peter,
>> > >
>> > > 1) For reference, here is a list of the errors:
>> > >
>> > > --------------------------------- Version Log
>> > > ----------------------------------
>> > > Version                                 Path
>> > > System Generator 14.6
>> > > C:/Xilinx/14.6/ISE_DS/ISE/sysgen
>> > > Matlab 8.0.0.783 (R2012b)               C:/MATLAB/R2012b
>> > > ISE                                     C:/Xilinx/14.6/ISE_DS/ISE
>> > >
>> > > --------------------------------------------------------------------------------
>> > > Summary of Errors:
>> > > Error 0001: Could not find the configuration m-function
>> > > "crc32x64_con...
>> > >      Block:
>> > > 'roach2_fengine_tl8511port/transpose/Transpose1/crc/crc32x64'
>> > > Error 0002: Could not find the configuration m-function
>> > > "crc32x64_con...
>> > >      Block:
>> > > 'roach2_fengine_tl8511port/transpose/Transpose2/crc/crc32x64'
>> > > Error 0003: Could not find the configuration m-function
>> > > "crc32x64_con...
>> > >      Block:
>> > > 'roach2_fengine_tl8511port/transpose/Transpose3/crc/crc32x64'
>> > > Error 0004: Could not find the configuration m-function
>> > > "crc32x64_con...
>> > >      Block:
>> > > 'roach2_fengine_tl8511port/transpose/Transpose4/crc/crc32x64'
>> > >
>> > > --------------------------------------------------------------------------------
>> > >
>> > > 2) Your email did not have an attachment.  I have more comments, but
>> > > wanted to let you know about the attachment before you went to bed.
>> > >
>> > > -Joe Kujawski
>> > >
>> > >
>> > >
>> > >
>> > > On Thu, Oct 23, 2014 at 10:33 AM, peter <peterniu...@163.com> wrote:
>> > >
>> > > Hi Joe,
>> > > Thanks for your warm help!
>> > > What error  shows when you compile my model?Is there some file it
>> > > missed? I will packet my whole file to you in the attachment. And how 
>> > > about
>> > > the PAPER one ?Did it report overflow message? It need to install and use
>> > > the ruby to control it .
>> > > Leave the PAPER model alone, Let's talk about the 10Gb block on roach
>> > > v2. Though your model is good to see the Data_valid and eof etc.  I don't
>> > > know how to add your model to the PAPER as I realize the PAPER have a 
>> > > data
>> > > valid and EOF according to a counter.So I don't know where to put the
>> > > model.For example,if I put the data_valid or eof control process you
>> > > designed on the 10Gbe port in PAPER model,then I think it equal to add a
>> > > 10Gbe block instead One_GBe block in yours. *_*!!
>> > > I change the number 50 to 1025 on tutorial 2 to make packet size to
>> > > 8200 bytes ,And it seems good transfer without error.it is a  frequency
>> > > 1.3*1025. that means 1 packet send every 1.3*1025 clock.I got the 
>> > > boundary
>> > > frequency 1.3*1025 by test a lot of times.  but when I change the 
>> > > frequency
>> > > lower than 1.3*1025,the first few packets can send out,but the overflow
>> > > comes.I think it is the transfer frequency that determined the overflow.
>> > > Thanks for your suggestions and advice!
>> > > peter
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > At 2014-10-23 00:14:29, "Kujawski, Joseph" <jkujaw...@siena.edu>
>> > > wrote:
>> > >
>> > > Peter,
>> > >
>> > > I find that I can not compile and simulate your design, however,
>> > > looking at the code structure, I can't tell if tx_val and tx_EOF are 
>> > > high at
>> > > the same time:
>> > >
>> > >
>> > >
>> > > Also, I modified the design to send out a packet of size 8200 once per
>> > > second (model attached) and added a register that latches the GBE tx_aful
>> > > and tx_overrun lines so they can be read through the KATCP interface.
>> > > Modify the model to remove the oscilloscope and Xilinx out gateways 
>> > > before
>> > > compiling it for your platform.  Note that this model does not check for
>> > > overflow, though the latch will let you know if you have had one.
>> > >
>> > > Let me know how this works for you.
>> > >
>> > > -Joe Kujawski
>> > > --
>> > > **************************************
>> > > * Joe Kujawski
>> > > * Siena College
>> > > * Dept. of Physics and Astronomy, RB 113
>> > > * 515 Loudon Road
>> > > * Loudonville, NY 12211-1462
>> > > *
>> > > * Email: jkujaw...@siena.edu
>> > > * Phone: 518-867-7509  <-- NEW NUMBER
>> > > * Fax: 518-783-2986
>> > > **************************************
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > --
>> > > **************************************
>> > > * Joe Kujawski
>> > > * Siena College
>> > > * Dept. of Physics and Astronomy, RB 113
>> > > * 515 Loudon Road
>> > > * Loudonville, NY 12211-1462
>> > > *
>> > > * Email: jkujaw...@siena.edu
>> > > * Phone: 518-867-7509  <-- NEW NUMBER
>> > > * Fax: 518-783-2986
>> > > **************************************
>> > >
>> > > 从网易163邮箱发来的云附件
>> > >
>> > > paperfengine.zip (126.71M, 2014年11月7日 22:58 到期)
>> > > 下载
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > --
>> > > **************************************
>> > > * Joe Kujawski
>> > > * Siena College
>> > > * Dept. of Physics and Astronomy, RB 113
>> > > * 515 Loudon Road
>> > > * Loudonville, NY 12211-1462
>> > > *
>> > > * Email: jkujaw...@siena.edu
>> > > * Phone: 518-867-7509  <-- NEW NUMBER
>> > > * Fax: 518-783-2986
>> > > **************************************
>> > >
>> > >
>> > >
>> > >
>> > > --
>> > > **************************************
>> > > * Joe Kujawski
>> > > * Siena College
>> > > * Dept. of Physics and Astronomy, RB 113
>> > > * 515 Loudon Road
>> > > * Loudonville, NY 12211-1462
>> > > *
>> > > * Email: jkujaw...@siena.edu
>> > > * Phone: 518-867-7509  <-- NEW NUMBER
>> > > * Fax: 518-783-2986
>> > > **************************************
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> >
>> >
>>
>

Reply via email to