Hi Andres
Another way to save logic is to hard-code the shifting schedule for the
FFT. (Go to the Implementation tab and choose 'Hardcode shift schedule).
This removes quite a lot of logic. You must then decide what shift
schedule to use to prevent overflows, starting with a '1' for every fft
stage in the 'Shift Schedule' mask parameter is a start.
Regards
Andrew
Hey Andres,
Just a quick follow up, since I've been playing some FFT timing issue
games myself --
I don't know which mlib_devel fork you are using, but in some you'll
find that rounding is implemented with the casper convert block, with
pipelined fabric adder cores. My experience is that whilst these
pipelined cores might help meet timing, the compiler does not optimize
them effectively. (E.g. there's one convert which essentially involves
an adder where one input is zero, but it's still compiled to a logic
chain if the adder is implemented in a core). I found that with a
couple of 4096pt FFTs, I saved ~6000 slices (on ROACH 2) by switching
back to the Xilinx cast blocks.
I've made a few mods to the FFT and convert blocks to allow:
1) Your own choice of cast block implementations, using either the
casper block in behavioural, Fabric core or DSP core modes, or the
Xilinx cast
2) Different choices of implementations in the twiddle blocks and
butterfly blocks (whose casts are different sizes).
3) Different choices of adder/convert latencies in the twiddle blocks
and butterflies. This allows you to tweak the latencies depending
which adders you want to implement in DSP.
I've also added a "UCF" yellow block, which allows you to add custom
UCF constraints (e.g. those generated by planahead) into a design, by
supplying an additional UCF file path within Simulink.
On the off chance any of this is useful to you or anyone else, it's in
https://github.com/jack-h/mlib_devel
Cheers,
Jack
On 27 November 2013 17:26, Andres Alvear <[email protected]> wrote:
Hello guys,
I'm so sorry to be confused to explain my project, right now have 2
spectrometers implemented in the ROACH 1 with the FPGA Virtex-5
xc5vsx95tff1136-1, with 2 ADC083000 boards. Now answering your question I'm
thinking that i've a lot of failed reports, but all of them have the same
pattern I mean the design have been compiled using logic extensively. The
next are the results from ADC @ 500 MHz and the clock of the FPGA @ 125 MHz
compilation:
Device Utilization Summary:
Number of BUFGs 8 out of 32 25%
Number of DCM_ADVs 4 out of 12 33%
Number of DSP48Es 384 out of 640 60%
Number of ILOGICs 111 out of 800 13%
Number of External IOBs 190 out of 640 29%
Number of LOCed IOBs 190 out of 190 100%
Number of OLOGICs 19 out of 800 2%
Number of RAMB18X2s 103 out of 244 42%
Number of RAMB36_EXPs 80 out of 244 32%
Number of Slices 14584 out of 14720 99%
Number of Slice Registers 53504 out of 58880 90%
Number used as Flip Flops 53504
Number used as Latches 0
Number used as LatchThrus 0
Number of Slice LUTS 44326 out of 58880 75%
Number of Slice LUT-Flip Flop pairs 56278 out of 58880 95%
Timing summary:
---------------
Timing errors: 0 Score: 0 (Setup/Max: 0, Hold: 0)
Constraints cover 801808 paths, 129 nets, and 158113 connections
Design statistics:
Minimum period: 8.332ns (Maximum frequency: 120.019MHz)
Maximum net delay: 2.652ns
I just have one way to get a successful compilation with two spectrometers
using a 4096 points of pipeline-FFT each without check the option “DSP48E
adders in butterfly” and low latencies like, “add latency=1”, “mult
latency=2”, “BRAMs=2”, “convert latency=1”, “input latency=0” n “latency
between biplexes and fft_direct=0”. The FIR filter work with their complete
stuff.
One of my specifics objectives of my project consist in increasing the
bandwidth of each spectrometer from 500 MHz to 1500 MHz without losing
spectral resolution, I mean increasing the number of channels proportionally
to the increase of the bandwidth, so I need to increase from 2048 channels
to 4096 channels too.
For this reason I go ahead to use PlanAhead using the results showing above.
After floorplaned my design like Ryan showed in his report using plan ahead
to generate a ucf file, which then I placed in data/ system.ucf. and then re
ran the tools for edk ise bitgen. This stage work successful for me.
My constraints were ok because didn't get timing errors.
I think that I've a conceptual error because I don't know why the
compilation generate timing groups with this timing constraint. In this part
I need your attention I don't know if I need to delete them or just modify
them putting more timing groups or increasing their speed or something like
that. Instead of configuring a clock pin to connect directly into an
internal clock tree, that pin can be used to drive a special hard-wired
function (block) called a clock manager that generates a number of daughter
clocks. So the logic says us that we need to use the ADC's external signal
clocks to propagate into the whole FPGA using the DCMs to generate daughter
clocks used to drive internal clock trees or output pins.
How can i do that? I'm going to attach my constraint file to check it out.
Dan, indeed I want to get Simulink designs to run above 300MHz on a ROACH 1,
so I'm thinking that is almost mandatory to manually constrain placement of
primitives on the FPGA fabric. I really thank Ryan for sending his memo, but
I want to say to him that I'm using it to understand this kind of works
since a couple of weeks. Therefore, I want to say that my goals are: make
the speed optimization to closing timing on an FPGA design with the ADCs
working at 3 GSPS and of course the clock of the FPGA @375 MHz, so I hope to
meet my constraints to clock up to 375 MHz bit by bit using plan ahead.
Best regards.
Cheers!
Andres
2013/11/26 Jack Hickish <[email protected]>
Hi Andres,
That system.twr you sent me doesn't appear to have any timing errors
(though it looks like it's compiled for an ADC clock of 500 MHz and
FPGA clock of 125 MHz). Do you have the report of a failed compile,
which will give some indication of what parts of the design are
causing problems.
Your ucf file looks ok -- I don't know how well the pblock constraints
you have will actually work, but it seems like you've got the right
idea. The ADC period constraint you have is for a 400 MHz FPGA clock
-- i.e. a 1600 MHz ADC clock. Is this really what you wanted? Note
that the constraint is for the clock received by the adc, which is 1/4
the rate of the ADC sampling clock.
Cheers
Jack
On 26 November 2013 19:10, Andres Alvear <[email protected]> wrote:
Hello Jack I'm Andres Alvear student of Electrical Engineering from
Chile.
Thanks Jack for your quick answer. I attached the file!
I mean that unchecked the options leaving EDK ISE Bitgen and then I
re-ran
getting a boffile (like the picture). I'm sorry I detected the problem
it
was in the synthesizer that synchronize the ADCs with their clock rate.
I
resolve that configuring the Synthesizer setup again.
However, I'm happy if you check out my system.twr. and do you know how I
can
constrains to the system run at 400MHz? What do you think about my
Global
Timing Constrains? Specifically what are your thoughts about my timing
groups that were generated from casper_xps toolflow compilation? Are
they
all right?
I'm not pretty sure if my constraints are working.
Cheers!
Andres
2013/11/26 Jack Hickish <[email protected]>
Hey Andres,
Just because I'm a little confused -- you say you re-ran EDK and got a
boffile, but timing constraints weren't met -- have you disabled the
check for timing closure the toolflow does? Usually you wouldn't get a
successful compile from a design that didn't meet timing.
If you attach your timing report (implementation/system.twr) I'm happy
to have a look. 54MHz is very slow, which makes me think there's
something not quite right going on....
Cheers,
Jack
On 26 November 2013 17:04, Andres Alvear <[email protected]>
wrote:
Thanks Ryan,
I have just generated my first .bof after re running the tools for
EDK/ISE/Bitgen successfully but I have not been able to view any
speed
optimization so far as you may see in my results in the attached
picture.
After the compilation my design ran with a clock rate of 54MHz
reaching
216MHz of Bandwidth on each spectrometer. The results obtained from
the
constraint generated in the floorplanning process were introduced in
the
"system.ucf' file that was located in the following folders:
/opt/workspace/spectrometer_dctrl_op/XPS_ROACH_base/data
/opt/workspace/spectrometer_dctrl_op/XPS_ROACH_base/implementation
In the "data" folder I removed system.ucf and system.ucf.bac and then
just
put my version (with the floorplan) of "system.ucf" in its place.
Then,
in
the "implementation" folder I replaced the "system.ucf" file with my
version. Finally, I opened the simulink design and then re-ran it
with
just
EDK/ISE/Bitgen. I had a successful compilation with a new .bof file.
This
one is working in the ROACH 1. However, my timing constrains were not
met.
I'm going to attach my constrains to see if you have some idea the
possible
problems. Given my constrain file attached, what values would you put
in
the
constrains so that the system run at 400MHz? What do you think about
my
Global Timing Constrains? Specifically what are your thoughts about
my
timing groups that were generated from casper_xps toolflow
compilation?
Are
they all right?
Cheers
Andres Alvear
2013/11/22 Ryan Monroe <[email protected]>
Hey Andres, my strategy has generally been to use plan ahead to
generate a
ucf file, which I then place in data/ system.ucf. then re run the
tools for
edk ise bitgen.
Works consistently for me
On Nov 22, 2013 11:31 AM, "Andres Alvear" <[email protected]>
wrote:
Hi everyone,
I'm working on Speed Optimization with PlanAhead, I've a Simulink
design
of a Spectrometer of 2048-channels and 2 ADCs ADC083000 to 1GSPS in
interleaved mode, and I want to meet a time optimization increasing
the
bandwidth to 1GHz from the actual 500MHz and of course increase the
numbers
of channels at least to 4096, but with the conventional tool flow
is
impossible.
First thing I told the system I wanted it to go to at 250 MHz, but
my
actual clock rate is about 120MHz too low!! However the system is
working
stable until 125MHz, so I can setup the ADC clock rate to 500MHz to
have
1GSPS getting a 500MHz of bandwidth to each ADC.
So I have been working on PlanAhead in a Floorplanning optimization
the
hardware implemented in the FPGA Virtex-5 SX95T, but after make the
floorplanning edit my constraint file like Ryan Monroe say in his
last
memo.
I got a 23% of Speed optimization from 120MHz to 148MHz, but I need
meet
time at least to 200MHz. However I have problems generating
functional
borph
executables, and I'm hoping someone can help me figure out why.
Since
I'm
targeting high speeds. This one is the error from Borph when I try
to
run
from a ssh session:
root@roach:/boffiles# ./system_2.bof
-bash: ./system_2.bof: Input/output error
Then in a ipython 2.7 terminal to check if you managed to connect
to
your
ROACH:
In [9]: fpga.is_connected()
Out[9]: True
Let's set the bitstream running using the progdev() command:
In [10]: fpga.progdev('system_2.bof') <-----------generated from
mkbof
Out[10]: 'ok'
See the ROACH and the leds not blinking. I placed these ones to see
the
working of my design, but these both not blinking at all:
led0_sync,
led1_new_acc.
Do you think that I am in the right the way? Does anyone know
something
about these problems?
Cheers!
Andres Alvear