On Thu, Oct 31, 2013 at 4:06 PM, David Malcolm <[email protected]> wrote:
> It's possible to run GCC's sources through Doxygen by setting
> INPUT_FILTER = contrib/filter_gcc_for_doxygen
> within contrib/gcc.doxy and invoking doxygen on the latter file.
>
> The script filters out various preprocessor constructs from GCC sources
> before Doxygen tries to parse them.
>
> However, as written, it works one line at-a-time, and thus can't cope
> with GTY markers that span multiple lines. I added such a marker
> in r204170 (symtab_node_base), and I'd like to add more such markers
> (see http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02703.html).
>
> So the attached patch rewrites the filtering script to cope with multiline
> GTY markers.
>
> I'm not familiar with Perl, so I rewrote the script in Python.
>
> I expanded the regexes for readability, and added a unit-testing suite.
> I also tweaked how comments get layed with @verbatim
> to avoid inconsistent indentation between the first line and subsequent
> lines within a comment.
>
> Tested with Python 2.7; you can see a sample of the output (from my
> gimple-as-C++-inheritance working copy) at:
> http://dmalcolm.fedorapeople.org/gcc/2013-10-31/doxygen/html/structgimple__statement__base.html
>
> In particular, with this patch Doxygen is able to cope with the symtab
> GTY marker, and thus parse the symtab class hierarchy, giving the output
> viewable here:
> http://dmalcolm.fedorapeople.org/gcc/2013-10-31/doxygen/html/classsymtab__node__base.html
>
> OK for trunk?
>
> contrib/
> * filter_gcc_for_doxygen: Use filter_params.py rather than
> filter_params.pl.
> * filter_params.pl: Delete in favor of...
> * filter_params.py: New, porting the perl script to python in
> order to cope with GTY markers that span multiple lines,
> tweaking the layout of comments, and adding a test suite.
> ---
> contrib/filter_gcc_for_doxygen | 2 +-
> contrib/filter_params.pl | 14 ---
> contrib/filter_params.py | 191
> +++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 192 insertions(+), 15 deletions(-)
> delete mode 100755 contrib/filter_params.pl
> create mode 100644 contrib/filter_params.py
>
> diff --git a/contrib/filter_gcc_for_doxygen b/contrib/filter_gcc_for_doxygen
> index 3787eeb..ca1db31 100755
> --- a/contrib/filter_gcc_for_doxygen
> +++ b/contrib/filter_gcc_for_doxygen
> @@ -8,5 +8,5 @@
> # process is put on stdout.
>
> dir=`dirname $0`
> -perl $dir/filter_params.pl < $1 | perl $dir/filter_knr2ansi.pl
> +python $dir/filter_params.py $1 | perl $dir/filter_knr2ansi.pl
> exit 0
> diff --git a/contrib/filter_params.pl b/contrib/filter_params.pl
> deleted file mode 100755
> index 22dae6c..0000000
> --- a/contrib/filter_params.pl
> +++ /dev/null
> @@ -1,14 +0,0 @@
> -#!/usr/bin/perl
> -
> -# Filters out some of the #defines used throughout the GCC sources:
> -# - GTY(()) marks declarations for gengtype.c
> -# - PARAMS(()) is used for K&R compatibility. See ansidecl.h.
> -
> -while (<>) {
> - s/^\/\* /\/\*\* \@verbatim /;
> - s/\*\// \@endverbatim \*\//;
> - s/GTY[ \t]*\(\(.*\)\)//g;
> - s/[ \t]ATTRIBUTE_UNUSED//g;
> - s/PARAMS[ \t]*\(\((.*?)\)\)/\($1\)/sg;
> - print;
> -}
> diff --git a/contrib/filter_params.py b/contrib/filter_params.py
> new file mode 100644
> index 0000000..d5092f6
> --- /dev/null
> +++ b/contrib/filter_params.py
> @@ -0,0 +1,191 @@
> +#!/usr/bin/python
> +"""
> +Filters out some of the #defines used throughout the GCC sources:
> +- GTY(()) marks declarations for gengtype.c
> +- PARAMS(()) is used for K&R compatibility. See ansidecl.h.
> +
> +When passed one or more filenames, acts on those files and prints the
> +results to stdout.
> +
> +When run without a filename, runs a unit-testing suite.
> +"""
> +import re
> +import sys
> +import unittest
> +
> +# Optional whitespace
> +opt_ws = '\s*'
> +
> +def filter_src(text):
> + """
> + str -> str. We operate on the whole of the source file at once
> + (rather than individual lines) so that we can have multiline
> + regexes.
> + """
> +
> + # First, convert tabs to spaces so that we can line things up
> + # sanely.
> + text = text.expandtabs(8)
Does it really matter? I thought doxygen reformatted output anyway.
> +
> + # Convert C comments from GNU coding convention of:
> + # /* FIRST_LINE
> + # NEXT_LINE
> + # FINAL_LINE. */
> + # to:
> + # /** @verbatim
> + # FIRST_LINE
> + # NEXT_LINE
> + # FINAL_LINE. @endverbatim */
> + # so that doxygen will parse them, and so the first line
> + # lines up correctly with subsequent lines.
> + # Single-line comments turn from:
> + # /* TEXT. */
> + # into:
> + # /** @verbatim
> + # TEXT. @endverbatim */
Oh, for @verbatim. But, why @verbatim? One trick we could do here is
recognize ALL CAPS parameters and turn them into \p PARAM. Later on,
we just emit \param. Though I guess it would not be easy to pick up
the description. Anyway, we can think about this for later.
At the same time, I would like to encourage the use of doxygen markers
in the code. If we start adding all these @verbatim markers, those
markers will be canceled out. In fact, I would like to set up some
expectations for doxygen mark ups in the coding guidelines themselves.
> +
> + text = re.sub(r'^([ \t]*)/\*+ ',
> + (r'\1/** @verbatim\n'
> + r'\1 '),
> + text,
> + flags=re.MULTILINE)
> + text = re.sub(r'\*+/',
> + r'@endverbatim */',
> + text)
> +
> + # Remove GTY markings (potentially multiline ones):
> + text = re.sub('GTY' + opt_ws + r'\(\(.*?\)\)\s+',
> + '',
> + text,
> + flags=(re.MULTILINE|re.DOTALL))
> +
> + # Strip out 'ATTRIBUTE_UNUSED'
> + text = re.sub('\sATTRIBUTE_UNUSED',
> + '',
> + text)
> +
> + # PARAMS(()) is used for K&R compatibility. See ansidecl.h.
> + text = re.sub('PARAMS' + opt_ws + r'\(\((.*?)\)\)',
> + r'(\1)',
> + text)
> +
> + return text
> +
> +class FilteringTests(unittest.TestCase):
> + '''
> + Unit tests for filter_src.
> + '''
> + def assert_filters_to(self, src_input, expected_result):
> + # assertMultiLineEqual was added to unittest in 2.7/3.1
> + if hasattr(self, 'assertMultiLineEqual'):
> + assertion = self.assertMultiLineEqual
> + else:
> + assertion = self.assertEqual
> + assertion(expected_result, filter_src(src_input))
> +
> + def test_comment_example(self):
> + self.assert_filters_to(
> + (' /* FIRST_LINE\n'
> + ' NEXT_LINE\n'
> + ' FINAL_LINE. */\n'),
> + (' /** @verbatim\n'
> + ' FIRST_LINE\n'
> + ' NEXT_LINE\n'
> + ' FINAL_LINE. @endverbatim */\n'))
> +
> + def test_oneliner_comment(self):
> + self.assert_filters_to(
> + '/* Returns the string representing CLASS. */\n',
> + ('/** @verbatim\n'
> + ' Returns the string representing CLASS. @endverbatim */\n'))
> +
> + def test_multiline_comment_without_initial_indent(self):
> + self.assert_filters_to(
> + ('/* The thread-local storage model associated with a given
> VAR_DECL\n'
> + " or SYMBOL_REF. This isn't used much, but both trees and
> RTL refer\n"
> + " to it, so it's here. */\n"),
> + ('/** @verbatim\n'
> + ' The thread-local storage model associated with a given
> VAR_DECL\n'
> + " or SYMBOL_REF. This isn't used much, but both trees and
> RTL refer\n"
> + " to it, so it's here. @endverbatim */\n"))
> +
> + def test_multiline_comment_with_initial_indent(self):
> + self.assert_filters_to(
> + (' /* Set once the definition was analyzed. The list of
> references and\n'
> + ' other properties are built during analysis. */\n'),
> + (' /** @verbatim\n'
> + ' Set once the definition was analyzed. The list of
> references and\n'
> + ' other properties are built during analysis. @endverbatim
> */\n'))
> +
> + def test_multiline_comment_with_tabs(self):
> + # Ensure that multiline comments containing tabs look sane, in
> + # this case, one with initial indentation.
> + self.assert_filters_to(
> + (' /* All the anchor SYMBOL_REFs used to address these objects,
> sorted\n'
> + ' in order of increasing offset, and then increasing TLS
> model.\n'
> + ' The following conditions will hold for each element X in
> this vector:\n'
> + '\n'
> + '\t SYMBOL_REF_HAS_BLOCK_INFO_P (X)\n'
> + '\t SYMBOL_REF_ANCHOR_P (X)\n'
> + '\t SYMBOL_REF_BLOCK (X) == [address of this structure]\n'
> + '\t SYMBOL_REF_BLOCK_OFFSET (X) >= 0. */\n'
> + ' vec<rtx, va_gc> *anchors;\n'),
> + (' /** @verbatim\n'
> + ' All the anchor SYMBOL_REFs used to address these objects,
> sorted\n'
> + ' in order of increasing offset, and then increasing TLS
> model.\n'
> + ' The following conditions will hold for each element X in
> this vector:\n'
> + '\n'
> + ' SYMBOL_REF_HAS_BLOCK_INFO_P (X)\n'
> + ' SYMBOL_REF_ANCHOR_P (X)\n'
> + ' SYMBOL_REF_BLOCK (X) == [address of this structure]\n'
> + ' SYMBOL_REF_BLOCK_OFFSET (X) >= 0. @endverbatim */\n'
> + ' vec<rtx, va_gc> *anchors;\n'))
> +
> + def test_simple_GTY(self):
> + # Ensure that a simple GTY is filtered out.
> + self.assert_filters_to(
> + ('typedef struct GTY(()) alias_pair {\n'
> + ' tree decl;\n'
> + ' tree target;\n'
> + '} alias_pair;\n'),
> + ('typedef struct alias_pair {\n'
> + ' tree decl;\n'
> + ' tree target;\n'
> + '} alias_pair;\n'))
> +
> + def test_multiline_GTY(self):
> + # Ensure that a multiline GTY is filtered out.
> + self.assert_filters_to(
> + ('class GTY((desc ("%h.type"), tag ("SYMTAB_SYMBOL"),\n'
> + '\t chain_next ("%h.next"), chain_prev ("%h.previous")))\n'
> + ' symtab_node_base\n'
> + '{\n'),
> + ('class symtab_node_base\n'
> + '{\n'))
> +
> + def test_ATTRIBUTE_UNUSED(self):
> + # Ensure that ATTRIBUTE_UNUSED is filtered out.
> + self.assert_filters_to(
> + ('static void\n'
> + 'record_set (rtx dest, const_rtx set, void *data
> ATTRIBUTE_UNUSED)\n'
> + '{\n'),
> + ('static void\n'
> + 'record_set (rtx dest, const_rtx set, void *data)\n'
> + '{\n'))
> +
> + def test_PARAMS(self):
> + self.assert_filters_to(
> + 'char *strcpy PARAMS ((char *dest, char *source));\n',
> + 'char *strcpy (char *dest, char *source);\n')
> +
> +def act_on_files(argv):
> + for filename in argv[1:]:
> + with open(filename) as f:
> + text = f.read()
> + print(filter_src(text))
> +
> +if __name__ == '__main__':
> + if len(sys.argv) > 1:
> + act_on_files(sys.argv)
> + else:
> + unittest.main()
TESTS! Thanks. This is so very refreshing :)
Other than the @verbatim markers, the rest looks fine.
Diego.