Hi Sharat,

Did you do any of the other tests Dave suggested? Compiling/running at a
lower clock speed? Checking for roach revision compatibility? Checking
whether the failures after swapping zdoks / adc cards were always the same?

I would suggest the debug messages like --

2015-09-07 13:00:15,564 -          adc5g.tools - DEBUG - ##### GLITCHES FOR
CORE 0 BY IODELAY #####
2015-09-07 13:00:15,564 -          adc5g.tools - DEBUG -  0:   0   0   0
0  41   0   0   0TOTAL 41
2015-09-07 13:00:15,564 -          adc5g.tools - DEBUG -  1:   0   0   0
0   0   0   0   0TOTAL 0
2015-09-07 13:00:15,564 -          adc5g.tools - DEBUG -  2:   0   0   0
0   0   0   0   0TOTAL 0
2015-09-07 13:00:15,564 -          adc5g.tools - DEBUG -  3:   0   0   0
0   0   0   0   0TOTAL 0
2015-09-07 13:00:15,565 -          adc5g.tools - DEBUG -  4:   0   0   0
0   0   0   0   0TOTAL 0
2015-09-07 13:00:15,565 -          adc5g.tools - DEBUG -  5:   0   0   0
0   0   0   0   0TOTAL 0
2015-09-07 13:00:15,565 -          adc5g.tools - DEBUG -  6:   0 108   4
0   0   0   0 100TOTAL 212
2015-09-07 13:00:15,565 -          adc5g.tools - DEBUG -  7:  54 593 584
304   0 174  12 368TOTAL 2089
2015-09-07 13:00:15,565 -          adc5g.tools - DEBUG -  8: 599   6   0
577 227 383 580 371TOTAL 2743
2015-09-07 13:00:15,565 -          adc5g.tools - DEBUG -  9:  27   0   0
0 371   2 222 291TOTAL 913
2015-09-07 13:00:15,566 -          adc5g.tools - DEBUG - 10:   0   0   0
0   2   0   0   0TOTAL 2
2015-09-07 13:00:15,566 -          adc5g.tools - DEBUG - 11:   0   0   0
0   0   0   0   0TOTAL 0

-- are your best source of information, since they isolate errors to
individual bits of individual cores.

For what it's worth, I'd also recommend compiling against
github.com/casper-astro/mlib_devel.git, which is the main CASPER repository
(and it should support the adc5g). Other repositories have custom features
which may or may not be suitable for general consumption.

Cheers,
Jack

On Tue, 8 Sep 2015 at 22:17 sharat varma <va...@hku.hk> wrote:

> HI,
>
> Thank you all very much for your patience and inputs.
>
> I ran Jacks scripts to know where the problem lies, but could not figure
> out the reason for the glitches as they vary very widely when I reprogram
> the same boff file.
> The plots that I captured using the scripts for calibrating IO delays are
> 1.png,2.png and 3.png. The plots using calibrating MMCM are in mmcm1.png
> ,mmcm2.png, mmcm3.png.
> Some points that I observed during troubleshooting was
>
> The glitches on ZDOCK0 using adc5g_test_rev2.bof files is around 7500 and
> on ZDOCK1 is around 250 even though they keeps varying each time I program
> the roach. I swapped the ADCs and this seems to be consistent. I also used
> the same ADCs and plugged them into different ROACH2 system. This was
> consistent.
>
> I checked the clock and it seems to be fine. Since I was able to capture
> data from ADC on 4 different systems through 10gbe interface, I presume
> clock should be fine.
>
> I have used already compiled files from SMA test_adc repository. I have
> also generated the boffiles using the modelfiles. I have tried different
> frequencies for my design yet the results are the same.
> I still could not figure out where is the problem.
>
> Regards,
> Sharat
>
> On 7 September 2015 at 23:47, David MacMahon <dav...@astro.berkeley.edu>
> wrote:
>
>> Hi, Sharat,
>>
>> On Sep 6, 2015, at 11:02 PM, Jack Hickish wrote:
>>
>> > As the code suggests, the error comes because bit 1 of core 3 appears
>> to never be glitch free, no matter what the delay setting. It's not obvious
>> to me what could cause this.
>>
>> Just to expand on what Jack said, here are a few possible ideas (some of
>> which are sheer speculation):
>>
>> 1. Verify the pinout of the ADC data pins in system_pad.txt.  Revision 1
>> of the ROACH2 connected one of the ZDOK differential pairs to FPGA pins
>> that were in a different bank than the others.  This sub-optimal situation
>> was "discovered" after a small number of the "rev 1" boards had been made.
>> The design was quickly re-done for "rev 2".  Virtually all ROACH2s now are
>> "rev 2", but in the interest of advancing ROACH2 development (and their own
>> development) SMA took delivery of the "rev 1" boards.  It could be possible
>> that you are using an mlib_devel that is targeting a "rev 1" board, but
>> using the resultant BOF file on a "rev 2" board.
>>
>> 2. Try to compile your design for a slower clock setting.  For example,
>> maybe 2 or 3 Gsps instead of 5 Gsps.  While this may not meet your
>> application's needs, it could be a useful diagnostic.  Different sample
>> clock rates would result in different MMCM parameters.  The MMCM is quite
>> complex and sometimes the clock rates used result in internal MMCM
>> parameters that are suboptimal.  Although not directly relevant to the
>> ADC5G, the following ADC16 write-up expands on this idea:
>>
>>
>> https://casper.berkeley.edu/wiki/ADC16x250-8#ADC16_Sample_Rate_vs_Virtex-6_MMCM_Limitations
>>
>> Caveat: I am not vary familiar with the ADC5G's MMCM configuration, so
>> this may not be an issue at all.
>>
>> 3. Swap the ADCs between the two ZDOK connectors and see whether the same
>> bit(s) from the same core(s) fail(s) in the same way(s).  If so, then the
>> problem is probably on the ROACH2 side of things.
>>
>> 4. Find another ROACH2 with ADC5G cards on it and try your BOF file
>> there.  If it fails the same way then it is most likely a gateware problem
>> and not a hardware problem.
>>
>> HTH,
>> Dave
>>
>>
>

Reply via email to