On Sat, 2013-11-30 at 16:12 +0100, Adrien Prost-Boucle wrote:
> On Sat, 2013-11-30 at 12:48 +0000, Brian Drummond wrote:
> > The other thing you could try for the fsm is a 64-bit build of ghdl, on
> > a machine with at least 8 GB of physical RAM. 

> Argh
> 
> However I still feel that much memory is way beyond what should be
> enough to just compile this fsm.vhd...

Yes.

One part of the problem is your choice of data types.

To get the correct simulation behaviour for std_logic_vector, it must
treat each bit as a separate entity. That bit has its own gcc data
structure : many tens of bytes (a major gcc effort went into shaving a
few bytes of this structure by reusing 4 bytes for different purposes;
one of the things that broke ghdl for several years).

Then the optimisation passes may keep several modified copies of this
structure...

Unless you need signal resolution from multiple drivers you don't need
std_logic_vector.

I experimentally edited "tb.vhd" to use natural instead of
std_logic_vector. I had to :

change two type declarations:
type invec_type is array (0 to in_vectors_nb-1) of 
    natural range 0 to 2**(in_vector_bytes * 8) - 1;

add type conversions to two assignments:
stdin_vector <= std_logic_vector(to_unsigned(in_vectors(in_vector_idx),
     stdin_vector'length)) when in_vector_idx < in_vectors_nb 
     else (others => '0');

find/replace the literal format from X"55" to 16#55#

and unlike the original, it ran successfully with redundant clauses (now
"others => 0") in place. Memory usage still reached 3.5GB but the slv
version wouldn't run at all until I reduced the aggregate size from 370k
to 150k.

> I'd really love that the idea of reducing gcc back-end optim level could
> lead to good results, if not outright solving the problem.

I had no luck with -O0. But one loose end I meant to chase earlier: 
time ghdl -a -Os tb.vhd
(Optimise for code size) stays at a flat 900MB (but segfaulted in the
gcc tree optimisation passes after 5 minutes). There are flags to turn
off individual optimiser passes but I haven't tried these.

> 
> > On the subject of high level synthesis : have you seen these projects?
> > 
> > http://www.nkavvadias.com/hercules/
> > 
> > What's interesting about this one, to me, is that it involves GIMPLE as
> > an intermediate language, with the C front end based on gcc.
> > 
> > Which opens up the hypothetical possibility of adding
> > --enable-languages=ada to the configure stage, and offering high level
> > synth from Ada (perhaps Fortran would appeal in some circles)
> 
> Actually, my tool inherits the code parser of another HLS tool, UGH.
> This parser IS gcc's parser, modified to an unknown level, from an old
> gcc version.
> What the HLS part of my tool does is take parsed GIMPLE, convert it to
> some other graph more appropriate for HLS.
> 
> > If you've never used Ada, you may be wondering, why? I could suggest
> > many reasons, but here's one useful for HLS : fixed point types fully
> > supported by the language, and you get to choose the width...
> > 
> > Or the York Hardware Ada Compiler : for example
> > ftp://ftp.cs.york.ac.uk/papers/rtspapers/R%3AWard%3A2001.ps
> > or in more detail
> > http://www.cs.york.ac.uk/ftpdir/reports/2005/YCST/09/YCST-2005-09.pdf
> > A practical detail that undermines this paper a little is that the
> > language subset he uses for his "sequential Ada" example (p.176 of the
> > latter paper) is ... synthesisable VHDL. 
> > 
> > Seriously. 
> > 
> > Substitute " to " for " .. ", prepend "variable " to each variable
> > declaration, and wrap the example in a process, and XST swallows it
> > whole.
> > 
> > And spits out a lump of hardware, using about 3x as many CLBs as his
> > resource estimates (bigger if you factor in that I targetted a newer
> > FPGA) to implement the task in a single (very slow!) cycle.
> > 
> > Sound familiar?
> 
> Yes it does.
> There are many HSL tools in the wild... some that do rather only code
> transcription to vhdl, others that do much more elaborate things.
> 
> However generating appropriately pipelined circuits is like the Holy
> Grail of HLS. Really, I think some tools have achieved that, like GAUT,
> maybe SPARK and LegUp.
> My works with AUGH are not (yet) at that level, however resource usage
> estimated by AUGH are guaranted after place and route (calibration for
> back-end tols).
> 
> > For me, the important step in the York Hardware Ada Compiler is ... 
> > it reveals techniques for extracting sequentiality from an inherently
> > parallel problem! 
> > 
> > In other words, automatic resource sharing, to reduce the hardware size.
> > (Ironically, the exact opposite of the GPU programmers' Grand Challenge
> > turns out to be important!)
> > 
> > At which point it *might* interest you. It *may* have cracked a
> > different but important part of the puzzle.
> > 
> > My opinion is that he takes it too far, extracting all the sequentiality
> > he can find, hundreds of cycles, as if he was compiling for a
> > single-stream CPU. And the result is - to me - disappointing; the
> > hardware isn't orders of magnitude smaller. 
> 
> At least when targeting FPGA, it is known that the working frequency of
> the resulting circuit will be 10x lower that what a microprocessor (or
> GPU) achieves. So the only way to outperform these it to extract every
> bit of parallelism achievable, use custom operators that are built
> specifically for the application, take care not to exceed clock period
> and pipeline everything as much as possible. All this while ensuring the
> HW resource limits of the targeted FPGA are fulfilled.
> 
> However, if the user specifically wants to obtain combinatorial circuit,
> then there is only one way.
> 
> > I will also read your papers with interest
> 
> Will be sent separately
> 
> Best regards,
> Adrien
> 
> 
> 
> _______________________________________________
> Ghdl-discuss mailing list
> [email protected]
> https://mail.gna.org/listinfo/ghdl-discuss



_______________________________________________
Ghdl-discuss mailing list
[email protected]
https://mail.gna.org/listinfo/ghdl-discuss

Reply via email to