Hi Richard,

That's my theory, though I doubt it's right. But as you say, an easy
test is just to delay after issuing a sync for a couple more seconds
and see if that helps. But if your PPS is a real PPS (rather than just
a square wave at some vague 1s period) then I can't see what
difference this would make.
When that doesn't help, my inclination would be to start prodding the
10gbe control signals from software to make sure the reset / sw
enables are working / see if a tge reset without a new sync behaves
differently. But I can't imagine how that would be broken unless the
stuff on github is out of date (which I doubt).

Jack

On 27 October 2014 17:28, Richard Black <aeldstes...@gmail.com> wrote:
> Jack,
>
> I appreciate your help. I tend to agree that the issue is likely a hardware
> configuration problem, but we have been trying to match it as closely as
> possible.
>
> We do feed a 1-PPS signal into the board, but I'm hazy on the details of the
> other pulse parameters. I'll look into that as well.
>
> So, if I understand you correctly, you believe that the sync pulse is
> reaching the ethernet interfaces after the cores are enabled? If that is the
> case, couldn't we delay enabling the 10-GbE cores for another second to fix
> it? This might be a quick way to test that theory, but please correct me if
> I've misunderstood.
>
> Richard Black
>
> On Mon, Oct 27, 2014 at 11:05 AM, Jack Hickish <jackhick...@gmail.com>
> wrote:
>>
>> Hi Richard,
>>
>> I've just had a very brief look at the design / software, so take this
>> email with a pinch of salt, but on the off-chance you haven't checked
>> this....
>>
>> It looks like the PAPER F-engine setup on running the start script for
>> software / firmware out of the box is --
>>
>> 1. Disable all ethernet interfaces
>> 2. Arm sync generator, wait 1 second for PPS
>> 3. Reset ethernet interfaces
>> 4. Enable interfaces.
>>
>> These four steps seem like they should be safe, yet the behaviour
>> you're describing sounds like the design is midway sending a packet,
>> then gets a sync, gives up sending an end-of-frame and starts sending
>> a new packet, at which point the old packet + the new packet =
>> overflow.
>>
>> Knowing that the design works for paper, my wondering is whether after
>> arming the sync generator syncs are flowing through the design before
>> the ethernet interface is enabled. Do you have a PPS-like input? the
>> fengine initialisation script seems to wait for a second after arming,
>> but if your sync input is something significantly slower, you could
>> have problems.
>>
>> I'm sceptical about this theory (I think the symptoms would be lots of
>> OK packets when you brought up the interface, and then it dying when
>> the sync arrives, rather than a single good packet like you're
>> seeing), but if the firmware + software really is the same as that
>> working with paper, and the wiki hasn't just got out of sync with the
>> paper devs, perhaps the problem is in your hardware setup....
>>
>> Cheers,
>> Jack
>>
>> On 27 October 2014 16:38, Richard Black <aeldstes...@gmail.com> wrote:
>> > By "enable" port, I assume you mean the "valid" port. I've been looking
>> > at
>> > the PAPER model carefully for some time now, and that is how it
>> > operates. It
>> > has a gated valid signal with a software register on each 10-GbE core.
>> >
>> > Once again, this is not our model. This is one made available on the
>> > CASPER
>> > wiki and run without modifications.
>> >
>> > Richard Black
>> >
>> > On Mon, Oct 27, 2014 at 10:34 AM, Jason Manley <jman...@ska.ac.za>
>> > wrote:
>> >>
>> >> I suspect the 10GbE core's input FIFO is overflowing on startup. One
>> >> key
>> >> thing with this core is to the ensure that your design keeps the enable
>> >> port
>> >> held low until the core's been configured. The core becomes unusable
>> >> once
>> >> the TX FIFO overflows. This has been a long-standing bug (my emails
>> >> trace
>> >> back to 2009) but it's so easy to work around that I don't think
>> >> anyone's
>> >> bothered looking into fixing it.
>> >>
>> >> Jason Manley
>> >> CBF Manager
>> >> SKA-SA
>> >>
>> >> Cell: +27 82 662 7726
>> >> Work: +27 21 506 7300
>> >>
>> >> On 27 Oct 2014, at 18:25, Richard Black <aeldstes...@gmail.com> wrote:
>> >>
>> >> > Jason,
>> >> >
>> >> > Thanks for your comments. While I agree that changing the ADC
>> >> > frequency
>> >> > mid-operation is non-kosher and could result in uncertain behavior,
>> >> > the
>> >> > issue at hand for us is to figure out what is going on with the PAPER
>> >> > model
>> >> > that has been published on the CASPER wiki. This naturally won't be
>> >> > (and
>> >> > shouldn't be) the end-all solution to this problem.
>> >> >
>> >> > This is a reportedly fully-functional model that shouldn't require
>> >> > any
>> >> > major changes in order to operate. However, this has clearly not been
>> >> > the
>> >> > case in at least two independent situations (us and Peter). This begs
>> >> > the
>> >> > question: what's so different about our use of PAPER?
>> >> >
>> >> > We, at BYU, have made painstakingly sure that our IP addressing
>> >> > schemes,
>> >> > switch ports, and scripts are all configured correctly (thanks to
>> >> > David
>> >> > MacMahon for that, btw), but we still have hit the proverbial brick
>> >> > wall of
>> >> > 10-GbE overflow.  When I last corresponded with David, he explained
>> >> > that he
>> >> > remembers having a similar issue before, but can't recall exactly
>> >> > what the
>> >> > problem was.
>> >> >
>> >> > In any case, the fact that by turning down the ADC clock prior to
>> >> > start-up prevents the 10-GbE core from overflowing is a major lead
>> >> > for us at
>> >> > BYU (we've been spinning our wheels on this issue for several months
>> >> > now).
>> >> > By no means are we proposing mid-run ADC clock modifications, but
>> >> > this
>> >> > appears to be a very subtle (and quite sinister, in my opinion) bug.
>> >> >
>> >> > Any thoughts as to what might be going on?
>> >> >
>> >> > Richard Black
>> >> >
>> >> > On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley <jman...@ska.ac.za>
>> >> > wrote:
>> >> > Just a note that I don't recommend you adjust FPGA clock frequencies
>> >> > while it's operating. In theory, you should do a global reset in case
>> >> > the
>> >> > PLL/DLLs lose lock during clock transitions, in which case the logic
>> >> > could
>> >> > be in a uncertain state. But the Sysgen flow just does a single POR.
>> >> >
>> >> > A better solution might be to keep the 10GbE cores turned off (enable
>> >> > line pulled low) on initialisation, until things are configured
>> >> > (tgtap
>> >> > started etc), and only then enable the transmission using a SW
>> >> > register.
>> >> >
>> >> > Jason Manley
>> >> > CBF Manager
>> >> > SKA-SA
>> >> >
>> >> > Cell: +27 82 662 7726
>> >> > Work: +27 21 506 7300
>> >> >
>> >> > On 25 Oct 2014, at 10:34, peter <peterniu...@163.com> wrote:
>> >> >
>> >> > > Hi Richard,Joe,& all,
>> >> > > Thanks for your help,It finally can receive packets now!
>> >> > > As you point,After enabled the ADC card and run bof
>> >> > > file(./adc_init.rb
>> >> > > roach1 bof file)in 200 Mhz (or higher than it), We need run init
>> >> > > fengien
>> >> > > script in about 75 Mhz ,(./paper_feng_init.rb roach1:0 ) ,That will
>> >> > > allow
>> >> > > the packet transfer.  then we can turn the frequency
>> >> > > higher.However the
>> >> > > finally ADC clock frequency is up to 120 Mhz in my experiment.Our
>> >> > > final ADC
>> >> > > frequency standard is 250 Mhz. Maybe I need run the bof file in a
>> >> > > higher ADC
>> >> > > frequency first to make a final steady 250 Mhz ADC clock frequncy.
>> >> > > Why it need init in a lower frequency and turn it up? That didn't
>> >> > > make
>> >> > > sense.Is the hardware going wrong?As the yellow block adc16*250-8
>> >> > > is
>> >> > > designed for 250 Mhz, it should be ok for 200Mhz or 250 Mhz.How
>> >> > > about the
>> >> > > final frequency in your experiment?
>> >> > > Any reply will be helpful!
>> >> > > Best Regards!
>> >> > > peter
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > At 2014-10-25 00:36:52, "Richard Black" <aeldstes...@gmail.com>
>> >> > > wrote:
>> >> > > Peter,
>> >> > >
>> >> > > That's correct. We downloaded the FPGA firmware and programmed the
>> >> > > ROACH with the precompiled bitstream. When we didn't get any data
>> >> > > beyond
>> >> > > that single packet, we stuck some overflow status registers in the
>> >> > > model and
>> >> > > found that we were overflowing at 1025 64-bit words (i.e. 8200
>> >> > > bytes).
>> >> > >
>> >> > > We have actually found a way to get packets to flow, but it isn't a
>> >> > > good fix. When we turn the ADC clock frequency down to about 75
>> >> > > MHz, the
>> >> > > packets begin to flow. There is an opinion in our group that the
>> >> > > 10-GbE
>> >> > > buffer overflow is a transient behavior, and, hence, if we slowly
>> >> > > turn up
>> >> > > the clock frequency after the ROACH has started up, packets may
>> >> > > continue to
>> >> > > flow in steady-state operation. We haven't tested this yet, though.
>> >> > >
>> >> > > Richard Black
>> >> > >
>> >> > > On Thu, Oct 23, 2014 at 8:39 PM, peter <peterniu...@163.com> wrote:
>> >> > > Hi Richard,& All,
>> >> > > As you said the size of isolate packet is changing every time. ) :
>> >> > > tcpdump: verbose output suppressed, use -v or -vv for full protocol
>> >> > > decode
>> >> > > listening on px1-2, link-type EN10MB (Ethernet), capture size 65535
>> >> > > bytes
>> >> > > 10:10:55.622053 IP 10.10.2.1.8511 > 10.10.2.9.8511: UDP, length
>> >> > > 4616
>> >> > > Ddi you download the PAPER gateware on the casper
>> >> > > (https://casper.berkeley.edu/wiki/PAPER_Correlator_Manifest )
>> >> > > directly? How
>> >> > > about the PAPER bof file run on your system? Have you met overflow
>> >> > > before?I
>> >> > > download and install  PAPER model as the website says ,but the
>> >> > > overflow
>> >> > > shows when I run the paper_feng_netstat.rb.
>> >> > > Thanks for your information.
>> >> > > peter
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > At 2014-10-24 09:59:12, "Richard Black" <aeldstes...@gmail.com>
>> >> > > wrote:
>> >> > > Peter,
>> >> > >
>> >> > > I don't mean to hijack your thread, but we've been having a very
>> >> > > similar (and time-absorbing) issue with the PAPER f-engine FPGA
>> >> > > firmware
>> >> > > here at BYU. Out of curiosity, does this single packet that you're
>> >> > > receiving
>> >> > > in tcpdump change in size every time you reprogram the ROACH? We've
>> >> > > seen
>> >> > > this happen, and we're pretty sure that this isolated packet is the
>> >> > > 10-GbE
>> >> > > buffer flushing when the 10-GbE core is initialized (i.e. the
>> >> > > enable signal
>> >> > > isn't sync'd with the start of new packet).
>> >> > >
>> >> > > Regardless of whether we have the same issue, I'm very interested
>> >> > > to
>> >> > > see this problem's resolution.
>> >> > >
>> >> > > Good luck,
>> >> > >
>> >> > > Richard Black
>> >> > >
>> >> > > On Thu, Oct 23, 2014 at 7:50 PM, peter <peterniu...@163.com> wrote:
>> >> > > Hi Joe, & All,
>> >> > > I find a thing this morning , there is one packet send out from
>> >> > > roach
>> >> > > When I run PAPER model, which I got from HPC tcpdump:
>> >> > > tcpdump: verbose output suppressed, use -v or -vv for full protocol
>> >> > > decode
>> >> > > listening on px1-2, link-type EN10MB (Ethernet), capture size 65535
>> >> > > bytes
>> >> > > 09:04:02.757813 IP 10.10.2.1.8511 > 10.10.2.9.8511: UDP, length
>> >> > > 6456
>> >> > >
>> >> > > The lenght is not expected 8200+8 ,and far from full TX buffer size
>> >> > > 8K+512.And the other packets are stopped from overflow.
>> >> > > I have tried to change the tutorial 2 packet size to 8200 bytes and
>> >> > > 8K
>> >> > > +512 bytes. It is  a good transfer.I also make sure the boundary
>> >> > > size is
>> >> > > indeed 8K+512 ,because while I change size to 8K+513 byetes ,There
>> >> > > is no
>> >> > > data send.So the received packet this morning with length 6456  is
>> >> > > totally
>> >> > > under the limit.But what caused the other packets  in overflow?
>> >> > > Any suggestions could be helpful !
>> >> > > peter
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > At 2014-10-24 00:37:14, "Kujawski, Joseph" <jkujaw...@siena.edu>
>> >> > > wrote:
>> >> > > Peter,
>> >> > >
>> >> > > By cadence of the broadcast, I mean how often are the 8200 byte
>> >> > > packets sent.  Basically, I would like to determine how close your
>> >> > > system is
>> >> > > to the maximum data rate of the 10Gbe.
>> >> > >
>> >> > > Also, it would be instructive to know the following:
>> >> > >
>> >> > > 1) What transmission protocol are you using? (the One_GBe module
>> >> > > uses
>> >> > > UDP are you using that or TCP?)
>> >> > >
>> >> > > 2) What NICs are you using on the receive side?
>> >> > >
>> >> > > At this time, I am working on the theory that the issue is related
>> >> > > to
>> >> > > the network itself not being able to sustain the data volume you
>> >> > > are
>> >> > > generating and would like to get a better idea of how much data is
>> >> > > generated
>> >> > > and how often it is sent.
>> >> > >
>> >> > > Thanks,
>> >> > >
>> >> > > -Joe Kujawski
>> >> > >
>> >> > >
>> >> > >
>> >> > > On Thu, Oct 23, 2014 at 12:01 PM, peter <peterniu...@163.com>
>> >> > > wrote:
>> >> > > hi Joe,
>> >> > > 1,yes ,acctually we have 3 roach2 with 8 nics.
>> >> > > 2,well,each roach has 4 of 8 NICs connect directly to pc.the other
>> >> > > 4
>> >> > > connect 10gb switch.I have connected the sfp wire( whitch should
>> >> > > connect
>> >> > > switch)  to pc directly to see whwther the data come out.but no
>> >> > > data out as
>> >> > > for the overflow.
>> >> > > 3 could you make an example about the cadence broadcast?I am not
>> >> > > familiar with this.
>> >> > > it indeed require bigger data,but each packet has the limited 8200
>> >> > > bytes.
>> >> > > thanks for your reply!
>> >> > > peter
>> >> > > --
>> >> > > 发自 Android 网易邮箱
>> >> > >
>> >> > >
>> >> > >
>> >> > > On 2014-10-23 23:16 , Kujawski, Joseph Wrote:
>> >> > >
>> >> > > Peter,
>> >> > >
>> >> > > I am downloading it now.  Can you answer these questions:
>> >> > >
>> >> > > 1) Do you have a standard PAPER architecture with two ROACH boards
>> >> > > each containing 8 10GBe ports?
>> >> > >
>> >> > > 2) Please describe your internet architecture i.e. how are each of
>> >> > > the
>> >> > > ports connected.
>> >> > >
>> >> > > 3) What is the cadence of each broadcast?
>> >> > >
>> >> > > My current suspicion is that you are generating more data than you
>> >> > > can
>> >> > > push through your interface(s).  It may be that the higher data
>> >> > > volume in
>> >> > > your implementation requires more of a network infrastructure than
>> >> > > was
>> >> > > required byt the original system.
>> >> > >
>> >> > > -Joe Kujawski
>> >> > >
>> >> > > On Thu, Oct 23, 2014 at 11:01 AM, peter <peterniu...@163.com>
>> >> > > wrote:
>> >> > > This is a littel big, roach2_tl8511port is the one can send data
>> >> > > normally.The environment should be ok now ,Iast time the
>> >> > > crc32x64_con may be
>> >> > > missing.
>> >> > > Good night!
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > At 2014-10-23 22:52:54, "Kujawski, Joseph" <jkujaw...@siena.edu>
>> >> > > wrote:
>> >> > > Peter,
>> >> > >
>> >> > > 1) For reference, here is a list of the errors:
>> >> > >
>> >> > > --------------------------------- Version Log
>> >> > > ----------------------------------
>> >> > > Version                                 Path
>> >> > > System Generator 14.6
>> >> > > C:/Xilinx/14.6/ISE_DS/ISE/sysgen
>> >> > > Matlab 8.0.0.783 (R2012b)               C:/MATLAB/R2012b
>> >> > > ISE                                     C:/Xilinx/14.6/ISE_DS/ISE
>> >> > >
>> >> > >
>> >> > > --------------------------------------------------------------------------------
>> >> > > Summary of Errors:
>> >> > > Error 0001: Could not find the configuration m-function
>> >> > > "crc32x64_con...
>> >> > >      Block:
>> >> > > 'roach2_fengine_tl8511port/transpose/Transpose1/crc/crc32x64'
>> >> > > Error 0002: Could not find the configuration m-function
>> >> > > "crc32x64_con...
>> >> > >      Block:
>> >> > > 'roach2_fengine_tl8511port/transpose/Transpose2/crc/crc32x64'
>> >> > > Error 0003: Could not find the configuration m-function
>> >> > > "crc32x64_con...
>> >> > >      Block:
>> >> > > 'roach2_fengine_tl8511port/transpose/Transpose3/crc/crc32x64'
>> >> > > Error 0004: Could not find the configuration m-function
>> >> > > "crc32x64_con...
>> >> > >      Block:
>> >> > > 'roach2_fengine_tl8511port/transpose/Transpose4/crc/crc32x64'
>> >> > >
>> >> > >
>> >> > > --------------------------------------------------------------------------------
>> >> > >
>> >> > > 2) Your email did not have an attachment.  I have more comments,
>> >> > > but
>> >> > > wanted to let you know about the attachment before you went to bed.
>> >> > >
>> >> > > -Joe Kujawski
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > On Thu, Oct 23, 2014 at 10:33 AM, peter <peterniu...@163.com>
>> >> > > wrote:
>> >> > >
>> >> > > Hi Joe,
>> >> > > Thanks for your warm help!
>> >> > > What error  shows when you compile my model?Is there some file it
>> >> > > missed? I will packet my whole file to you in the attachment. And
>> >> > > how about
>> >> > > the PAPER one ?Did it report overflow message? It need to install
>> >> > > and use
>> >> > > the ruby to control it .
>> >> > > Leave the PAPER model alone, Let's talk about the 10Gb block on
>> >> > > roach
>> >> > > v2. Though your model is good to see the Data_valid and eof etc.  I
>> >> > > don't
>> >> > > know how to add your model to the PAPER as I realize the PAPER have
>> >> > > a data
>> >> > > valid and EOF according to a counter.So I don't know where to put
>> >> > > the
>> >> > > model.For example,if I put the data_valid or eof control process
>> >> > > you
>> >> > > designed on the 10Gbe port in PAPER model,then I think it equal to
>> >> > > add a
>> >> > > 10Gbe block instead One_GBe block in yours. *_*!!
>> >> > > I change the number 50 to 1025 on tutorial 2 to make packet size to
>> >> > > 8200 bytes ,And it seems good transfer without error.it is a
>> >> > > frequency
>> >> > > 1.3*1025. that means 1 packet send every 1.3*1025 clock.I got the
>> >> > > boundary
>> >> > > frequency 1.3*1025 by test a lot of times.  but when I change the
>> >> > > frequency
>> >> > > lower than 1.3*1025,the first few packets can send out,but the
>> >> > > overflow
>> >> > > comes.I think it is the transfer frequency that determined the
>> >> > > overflow.
>> >> > > Thanks for your suggestions and advice!
>> >> > > peter
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > At 2014-10-23 00:14:29, "Kujawski, Joseph" <jkujaw...@siena.edu>
>> >> > > wrote:
>> >> > >
>> >> > > Peter,
>> >> > >
>> >> > > I find that I can not compile and simulate your design, however,
>> >> > > looking at the code structure, I can't tell if tx_val and tx_EOF
>> >> > > are high at
>> >> > > the same time:
>> >> > >
>> >> > >
>> >> > >
>> >> > > Also, I modified the design to send out a packet of size 8200 once
>> >> > > per
>> >> > > second (model attached) and added a register that latches the GBE
>> >> > > tx_aful
>> >> > > and tx_overrun lines so they can be read through the KATCP
>> >> > > interface.
>> >> > > Modify the model to remove the oscilloscope and Xilinx out gateways
>> >> > > before
>> >> > > compiling it for your platform.  Note that this model does not
>> >> > > check for
>> >> > > overflow, though the latch will let you know if you have had one.
>> >> > >
>> >> > > Let me know how this works for you.
>> >> > >
>> >> > > -Joe Kujawski
>> >> > > --
>> >> > > **************************************
>> >> > > * Joe Kujawski
>> >> > > * Siena College
>> >> > > * Dept. of Physics and Astronomy, RB 113
>> >> > > * 515 Loudon Road
>> >> > > * Loudonville, NY 12211-1462
>> >> > > *
>> >> > > * Email: jkujaw...@siena.edu
>> >> > > * Phone: 518-867-7509  <-- NEW NUMBER
>> >> > > * Fax: 518-783-2986
>> >> > > **************************************
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > --
>> >> > > **************************************
>> >> > > * Joe Kujawski
>> >> > > * Siena College
>> >> > > * Dept. of Physics and Astronomy, RB 113
>> >> > > * 515 Loudon Road
>> >> > > * Loudonville, NY 12211-1462
>> >> > > *
>> >> > > * Email: jkujaw...@siena.edu
>> >> > > * Phone: 518-867-7509  <-- NEW NUMBER
>> >> > > * Fax: 518-783-2986
>> >> > > **************************************
>> >> > >
>> >> > > 从网易163邮箱发来的云附件
>> >> > >
>> >> > > paperfengine.zip (126.71M, 2014年11月7日 22:58 到期)
>> >> > > 下载
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > --
>> >> > > **************************************
>> >> > > * Joe Kujawski
>> >> > > * Siena College
>> >> > > * Dept. of Physics and Astronomy, RB 113
>> >> > > * 515 Loudon Road
>> >> > > * Loudonville, NY 12211-1462
>> >> > > *
>> >> > > * Email: jkujaw...@siena.edu
>> >> > > * Phone: 518-867-7509  <-- NEW NUMBER
>> >> > > * Fax: 518-783-2986
>> >> > > **************************************
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > --
>> >> > > **************************************
>> >> > > * Joe Kujawski
>> >> > > * Siena College
>> >> > > * Dept. of Physics and Astronomy, RB 113
>> >> > > * 515 Loudon Road
>> >> > > * Loudonville, NY 12211-1462
>> >> > > *
>> >> > > * Email: jkujaw...@siena.edu
>> >> > > * Phone: 518-867-7509  <-- NEW NUMBER
>> >> > > * Fax: 518-783-2986
>> >> > > **************************************
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> >
>> >> >
>> >>
>> >
>
>

Reply via email to