Josh Triplett wrote:
> On Wed, 2007-06-27 at 14:51 +0100, Rob Taylor wrote:
>> Here's something I've hacked up for my work on gobject-introspection
>> [1]. It basically dumps the parse tree for a given file as simplistic
>> xml, suitable for further transformation by something else (in my case,
>> some python).
>>
>> I'd expect this to also be useful for code navigation in editors and c
>> refactoring tools, but I've really only focused on my needs for c api
>> description.
>>
>> There are 3 patches here. The first introduces a field in the symbol
>> struct for the end position of the symbol. I've added this in my case
>> for documentation generation, but again I think it'd be useful in other
>> cases. The next introduces a sparse_keep_tokens, which parses a file,
>> but doesn't free the tokens after parsing. The final one adds c2xml and
>> the DTD for the xml format. It builds conditionally on whether libxml2
>> is available.
>>
>> All feedback appreciated!
> 
> Wow.  Very nice.  I can already think of several other uses for this.

Glad you like it :) OOI, what other uses are you thinking of?

> A few suggestions:
> 
>       * Please sign off your patches.  See
>         
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;hb=HEAD;f=Documentation/SubmittingPatches
>  , section "Sign your work", for details on the Developer's Certificate of 
> Origin and the Signed-off-by convention.  I really need to include some 
> documentation in the Sparse source tree, though.

Ah, I did wonder what the 'signed-off-by' signified.

>       * Rather than specifying start="line:col" end="line:col", how
>         about splitting those up into start-line, start-col, end-line,
>         and end-col?  That would avoid the need to do string parsing
>         after reading the XML.

Yes. I originally had a more human-readable form, and this is a hangover
from that approach.

>       * Positions have file information associated with them.  A symbol
>         might potentially start in one file and end in another, if
>         people play crazy games with #include.  start-file and end-file?

Yes, optional end-file would be sensible. Hopefully it wouldn't occur
very often ;)

>       * Typo in examine_namespace: "Unregonized namespace".
yes.

>       * get_type_name seems generally useful, and several other parts of
>         Sparse (such as in evaluate.c and show-parse.c) could become
>         simpler by using it.  How about putting it in symbol.c and
>         exposing it via symbol.h?  Can you do that in a separate patch,
>         please?

Sure.
>       * Also, should get_type_name perhaps look up the string in an
>         array rather than using switch?  (I don't know which makes more
>         sense.)

Yeah, an array lookup would be better.

>       * I don't know how much work this would require, but it doesn't
>         seem like c2xml gets much value out of using libxml, so would it
>         make things very painful to just print XML directly?  It would
>         certainly make things like BAD_CAST and having to snprintf to
>         local buffers go away.  If you count on libxml for some form of
>         escaping or similar, please ignore this; however, as far as I
>         can tell, all of the strings that c2xml works with (such as
>         identifiers) can't have unusual characters in them.

Well, I'm using the tree builder. It would be non-trivial to rewrite
without it - see in examine_symbol where I add new nodes to the root
node and recurse from there.

>       * Please don't include vim modelines in source files.  (Same goes
>         for emacs and similar.)

Sure

>       * Please explicitly limit the possible values of the type
>         attribute to those that Sparse produces, rather than allowing
>         any arbitrary CDATA.  The same goes for a few other 

Ah, yes, good idea.

<snip>

>       * In examine_modifiers, please use C99-style designated assignment
>         for the modifiers array, for clarity and robustness.

Hmm, not sure how best to do this. Redefine MOD_* in terms of shifts of
some linearly assigned constants?

>       * I suspect several of the modifiers in examine_modifiers don't
>         need to generate output; I think you want to ignore everything
>         in MOD_IGNORE.

Do we really want to not emit any from MOD_STORAGE? I guess if we have
scoping info at a later date, we can certainly drop MOD_TOPLEVEL, but
that seems useful ATM. MOD_ADDRESSABLE seems useful. MOD_ASSIGNED,
MOD_USERTYPE, MOD_FORCE, MOD_ACCESSED and MOD_EXPLICTLY_SIGNED don't
seem very useful though.

I think MOD_TYPEDEF would be useful,but I never actually see it. Do you
know what's going on here?


Attached you should find the updated patchset with all the changes
discussed apart from the modifiers stuff discussed above.

<snip>

> 
> Note that you don't need to address all of these before resending.  In
> particular, I'd love to merge the first patch, and I just need a signoff
> for it.
> 
> Thanks again for this work; it looks great, and highly useful.

Thanks to you too!

Rob Taylor

>From d794c936d62279f37e2e894af3d2297286384dce Mon Sep 17 00:00:00 2001
From: Rob Taylor <[EMAIL PROTECTED]>
Date: Fri, 29 Jun 2007 17:25:51 +0100
Subject: [PATCH 1/4] add end position to symbols

This adds a field in the symbol struct for the position of the end of the
symbol and code to parse.c to fill this in for the various symbol types when
parsing.

Signed-off-by: Rob Taylor <[EMAIL PROTECTED]>
---
 parse.c  |   21 ++++++++++++++++++++-
 symbol.c |    1 +
 symbol.h |    1 +
 3 files changed, 22 insertions(+), 1 deletions(-)

diff --git a/parse.c b/parse.c
index cb9f87a..ae14642 100644
--- a/parse.c
+++ b/parse.c
@@ -505,6 +505,7 @@ static struct token *struct_union_enum_specifier(enum type type,
 
 			// Mark the structure as needing re-examination
 			sym->examined = 0;
+			sym->endpos = token->pos;
 		}
 		return token;
 	}
@@ -519,7 +520,10 @@ static struct token *struct_union_enum_specifier(enum type type,
 	sym = alloc_symbol(token->pos, type);
 	token = parse(token->next, sym);
 	ctype->base_type = sym;
-	return expect(token, '}', "at end of specifier");
+	token =  expect(token, '}', "at end of specifier");
+	sym->endpos = token->pos;
+
+	return token;
 }
 
 static struct token *parse_struct_declaration(struct token *token, struct symbol *sym)
@@ -712,6 +716,9 @@ static struct token *parse_enum_declaration(struct token *token, struct symbol *
 			lower_boundary(&lower, &v);
 		}
 		token = next;
+
+		sym->endpos = token->pos;
+
 		if (!match_op(token, ','))
 			break;
 		token = token->next;
@@ -775,6 +782,7 @@ static struct token *typeof_specifier(struct token *token, struct ctype *ctype)
 		token = parse_expression(token->next, &typeof_sym->initializer);
 
 		ctype->modifiers = 0;
+		typeof_sym->endpos = token->pos;
 		ctype->base_type = typeof_sym;
 	}		
 	return expect(token, ')', "after typeof");
@@ -1193,12 +1201,14 @@ static struct token *direct_declarator(struct token *token, struct symbol *decl,
 			sym = alloc_indirect_symbol(token->pos, ctype, SYM_FN);
 			token = parameter_type_list(next, sym, p);
 			token = expect(token, ')', "in function declarator");
+			sym->endpos = token->pos;
 			continue;
 		}
 		if (token->special == '[') {
 			struct symbol *array = alloc_indirect_symbol(token->pos, ctype, SYM_ARRAY);
 			token = abstract_array_declarator(token->next, array);
 			token = expect(token, ']', "in abstract_array_declarator");
+			array->endpos = token->pos;
 			ctype = &array->ctype;
 			continue;
 		}
@@ -1232,6 +1242,7 @@ static struct token *pointer(struct token *token, struct ctype *ctype)
 
 		token = declaration_specifiers(token->next, ctype, 1);
 		modifiers = ctype->modifiers;
+		ctype->base_type->endpos = token->pos;
 	}
 	return token;
 }
@@ -1286,6 +1297,7 @@ static struct token *handle_bitfield(struct token *token, struct symbol *decl)
 		}
 	}
 	bitfield->bit_size = width;
+	bitfield->endpos = token->pos;
 	return token;
 }
 
@@ -1306,6 +1318,7 @@ static struct token *declaration_list(struct token *token, struct symbol_list **
 		}
 		apply_modifiers(token->pos, &decl->ctype);
 		add_symbol(list, decl);
+		decl->endpos = token->pos;
 		if (!match_op(token, ','))
 			break;
 		token = token->next;
@@ -1340,6 +1353,7 @@ static struct token *parameter_declaration(struct token *token, struct symbol **
 	token = declarator(token, sym, &ident);
 	sym->ident = ident;
 	apply_modifiers(token->pos, &sym->ctype);
+	sym->endpos = token->pos;
 	return token;
 }
 
@@ -1350,6 +1364,7 @@ struct token *typename(struct token *token, struct symbol **p)
 	token = declaration_specifiers(token, &sym->ctype, 0);
 	token = declarator(token, sym, NULL);
 	apply_modifiers(token->pos, &sym->ctype);
+	sym->endpos = token->pos;
 	return token;
 }
 
@@ -1818,6 +1833,7 @@ static struct token *parameter_type_list(struct token *token, struct symbol *fn,
 			warning(token->pos, "void parameter");
 		}
 		add_symbol(list, sym);
+		sym->endpos = token->pos;
 		if (!match_op(token, ','))
 			break;
 		token = token->next;
@@ -2104,6 +2120,8 @@ struct token *external_declaration(struct token *token, struct symbol_list **lis
 	token = declarator(token, decl, &ident);
 	apply_modifiers(token->pos, &decl->ctype);
 
+	decl->endpos = token->pos;
+
 	/* Just a type declaration? */
 	if (!ident)
 		return expect(token, ';', "end of type declaration");
@@ -2164,6 +2182,7 @@ struct token *external_declaration(struct token *token, struct symbol_list **lis
 		token = declaration_specifiers(token, &decl->ctype, 1);
 		token = declarator(token, decl, &ident);
 		apply_modifiers(token->pos, &decl->ctype);
+		decl->endpos = token->pos;
 		if (!ident) {
 			sparse_error(token->pos, "expected identifier name in type definition");
 			return token;
diff --git a/symbol.c b/symbol.c
index 329fed9..7585978 100644
--- a/symbol.c
+++ b/symbol.c
@@ -62,6 +62,7 @@ struct symbol *alloc_symbol(struct position pos, int type)
 	struct symbol *sym = __alloc_symbol(0);
 	sym->type = type;
 	sym->pos = pos;
+	sym->endpos.type = 0;
 	return sym;
 }
 
diff --git a/symbol.h b/symbol.h
index 2bde84d..be5e6b1 100644
--- a/symbol.h
+++ b/symbol.h
@@ -111,6 +111,7 @@ struct symbol {
 	enum namespace namespace:9;
 	unsigned char used:1, attr:2, enum_member:1;
 	struct position pos;		/* Where this symbol was declared */
+	struct position endpos;		/* Where this symbol ends*/
 	struct ident *ident;		/* What identifier this symbol is associated with */
 	struct symbol *next_id;		/* Next semantic symbol that shares this identifier */
 	struct symbol **id_list;	/* Back pointer to symbol list head */
-- 
1.5.2-rc3.GIT

>From c0cf0ff431197fe02839ed05cd2e7dd2b6d5cdae Mon Sep 17 00:00:00 2001
From: Rob Taylor <[EMAIL PROTECTED]>
Date: Fri, 29 Jun 2007 17:33:29 +0100
Subject: [PATCH 2/4] add sparse_keep_tokens api to lib.h

Adds sparse_keep_tokens, which is the same as __sparse, but doesn't free the
tokens after parsing. Useful fow ehen you want to inspect macro symbols after
parsing.

Signed-off-by: Rob Taylor <[EMAIL PROTECTED]>
---
 lib.c |   13 ++++++++++++-
 lib.h |    1 +
 2 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/lib.c b/lib.c
index 7fea474..aba547a 100644
--- a/lib.c
+++ b/lib.c
@@ -741,7 +741,7 @@ struct symbol_list *sparse_initialize(int argc, char **argv, struct string_list
 	return list;
 }
 
-struct symbol_list * __sparse(char *filename)
+struct symbol_list * sparse_keep_tokens(char *filename)
 {
 	struct symbol_list *res;
 
@@ -751,6 +751,17 @@ struct symbol_list * __sparse(char *filename)
 	new_file_scope();
 	res = sparse_file(filename);
 
+	/* And return it */
+	return res;
+}
+
+
+struct symbol_list * __sparse(char *filename)
+{
+	struct symbol_list *res;
+
+	res = sparse_keep_tokens(filename);
+
 	/* Drop the tokens for this file after parsing */
 	clear_token_alloc();
 
diff --git a/lib.h b/lib.h
index bc2a8c2..aacafea 100644
--- a/lib.h
+++ b/lib.h
@@ -113,6 +113,7 @@ extern void declare_builtin_functions(void);
 extern void create_builtin_stream(void);
 extern struct symbol_list *sparse_initialize(int argc, char **argv, struct string_list **files);
 extern struct symbol_list *__sparse(char *filename);
+extern struct symbol_list *sparse_keep_tokens(char *filename);
 extern struct symbol_list *sparse(char *filename);
 
 static inline int symbol_list_size(struct symbol_list *list)
-- 
1.5.2-rc3.GIT

>From d809173f376d5cb6281832aec57c4f31c0447020 Mon Sep 17 00:00:00 2001
From: Rob Taylor <[EMAIL PROTECTED]>
Date: Mon, 2 Jul 2007 13:26:42 +0100
Subject: [PATCH 3/4] new get_type_name function

Adds function get_type_name to symbol.h to get a string representation of a given type.

Signed-off-by: Rob Taylor <[EMAIL PROTECTED]>
---
 symbol.c |   29 +++++++++++++++++++++++++++++
 symbol.h |    1 +
 2 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/symbol.c b/symbol.c
index 7585978..516c50f 100644
--- a/symbol.c
+++ b/symbol.c
@@ -444,6 +444,35 @@ struct symbol *examine_symbol_type(struct symbol * sym)
 	return sym;
 }
 
+const char* get_type_name(enum type type)
+{
+	const char *type_lookup[] = {
+	[SYM_UNINITIALIZED] = "uninitialized",
+	[SYM_PREPROCESSOR] = "preprocessor",
+	[SYM_BASETYPE] = "basetype",
+	[SYM_NODE] = "node",
+	[SYM_PTR] = "pointer",
+	[SYM_FN] = "function",
+	[SYM_ARRAY] = "array",
+	[SYM_STRUCT] = "struct",
+	[SYM_UNION] = "union",
+	[SYM_ENUM] = "enum",
+	[SYM_TYPEDEF] = "typedef",
+	[SYM_TYPEOF] = "typeof",
+	[SYM_MEMBER] = "member",
+	[SYM_BITFIELD] = "bitfield",
+	[SYM_LABEL] = "label",
+	[SYM_RESTRICT] = "restrict",
+	[SYM_FOULED] = "fouled",
+	[SYM_KEYWORD] = "keyword",
+	[SYM_BAD] = "bad"};
+
+	if (type <= SYM_BAD)
+		return type_lookup[type];
+	else
+		return NULL;
+}
+
 static struct symbol_list *restr, *fouled;
 
 void create_fouled(struct symbol *type)
diff --git a/symbol.h b/symbol.h
index be5e6b1..c651a84 100644
--- a/symbol.h
+++ b/symbol.h
@@ -267,6 +267,7 @@ extern void examine_simple_symbol_type(struct symbol *);
 extern const char *show_typename(struct symbol *sym);
 extern const char *builtin_typename(struct symbol *sym);
 extern const char *builtin_ctypename(struct ctype *ctype);
+extern const char* get_type_name(enum type type);
 
 extern void debug_symbol(struct symbol *);
 extern void merge_type(struct symbol *sym, struct symbol *base_type);
-- 
1.5.2-rc3.GIT

>From 51785f1c32ab857432f4fb4a5c99bda4d80bc51f Mon Sep 17 00:00:00 2001
From: Rob Taylor <[EMAIL PROTECTED]>
Date: Mon, 2 Jul 2007 13:27:46 +0100
Subject: [PATCH 4/4] add c2xml program

Adds new c2xml program which dumps out the parse tree for a given file as well formed xml. A DTD for the format is included as parse.dtd.

Signed-off-by: Rob Taylor <[EMAIL PROTECTED]>
---
 Makefile  |   15 +++
 c2xml.c   |  324 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 parse.dtd |   48 +++++++++
 3 files changed, 387 insertions(+), 0 deletions(-)
 create mode 100644 c2xml.c
 create mode 100644 parse.dtd

diff --git a/Makefile b/Makefile
index 039fe38..67da31f 100644
--- a/Makefile
+++ b/Makefile
@@ -7,6 +7,8 @@ CFLAGS=-O -g -Wall -Wwrite-strings -fpic
 LDFLAGS=-g
 AR=ar
 
+HAVE_LIBXML=$(shell pkg-config --exists libxml-2.0 && echo 'yes')
+
 #
 # For debugging, uncomment the next one
 #
@@ -21,8 +23,15 @@ PKGCONFIGDIR=$(LIBDIR)/pkgconfig
 
 PROGRAMS=test-lexing test-parsing obfuscate compile graph sparse test-linearize example \
 	 test-unssa test-dissect ctags
+
+
 INST_PROGRAMS=sparse cgcc
 
+ifeq ($(HAVE_LIBXML),yes)
+PROGRAMS+=c2xml
+INST_PROGRAMS+=c2xml
+endif
+
 LIB_H=    token.h parse.h lib.h symbol.h scope.h expression.h target.h \
 	  linearize.h bitmap.h ident-list.h compat.h flow.h allocate.h \
 	  storage.h ptrlist.h dissect.h
@@ -107,6 +116,12 @@ test-dissect: test-dissect.o $(LIBS)
 ctags: ctags.o $(LIBS)
 	$(QUIET_LINK)$(CC) $(LDFLAGS) -o $@ $< $(LIBS)
 
+ifeq ($(HAVE_LIBXML),yes)
+c2xml: c2xml.c $(LIBS) $(LIB_H)
+	$(CC) $(LDFLAGS) `pkg-config --cflags --libs libxml-2.0` -o $@ $< $(LIBS)
+
+endif
+
 $(LIB_FILE): $(LIB_OBJS)
 	$(QUIET_AR)$(AR) rcs $@ $(LIB_OBJS)
 
diff --git a/c2xml.c b/c2xml.c
new file mode 100644
index 0000000..25d1c40
--- /dev/null
+++ b/c2xml.c
@@ -0,0 +1,324 @@
+/*
+ * Sparse c2xml
+ *
+ * Dumps the parse tree as an xml document
+ *
+ * Copyright (C) 2007 Rob Taylor
+ *
+ * Licensed under the Open Software License version 1.1
+ */
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <assert.h>
+#include <libxml/parser.h>
+#include <libxml/tree.h>
+
+#include "parse.h"
+#include "scope.h"
+#include "symbol.h"
+
+xmlDocPtr doc = NULL;       /* document pointer */
+xmlNodePtr root_node = NULL;/* root node pointer */
+xmlDtdPtr dtd = NULL;       /* DTD pointer */
+xmlNsPtr ns = NULL;         /* namespace pointer */
+int idcount = 0;
+
+static struct symbol_list *taglist = NULL;
+
+static void examine_symbol(struct symbol *sym, xmlNodePtr node);
+
+static xmlAttrPtr newNumProp(xmlNodePtr node, const xmlChar * name, int value)
+{
+	char buf[256];
+	snprintf(buf, 256, "%d", value);
+	return xmlNewProp(node, name, buf);
+}
+
+static xmlAttrPtr newIdProp(xmlNodePtr node, const xmlChar * name, unsigned int id)
+{
+	char buf[256];
+	snprintf(buf, 256, "_%d", id);
+	return xmlNewProp(node, name, buf);
+}
+
+static xmlNodePtr new_sym_node(struct symbol *sym, const char *name, xmlNodePtr parent)
+{
+	xmlNodePtr node;
+	const char *ident = show_ident(sym->ident);
+
+	assert(name != NULL);
+	assert(sym != NULL);
+	assert(parent != NULL);
+
+	node = xmlNewChild(parent, NULL, "symbol", NULL);
+
+	xmlNewProp(node, "type",  name);
+
+	newIdProp(node, "id", idcount);
+
+	if (sym->ident && ident)
+		xmlNewProp(node, "ident", ident);
+	xmlNewProp(node, "file", stream_name(sym->pos.stream));
+
+	newNumProp(node, "start-line", sym->pos.line);
+	newNumProp(node, "start-col", sym->pos.pos);
+
+	if (sym->endpos.type) {
+		newNumProp(node, "end-line", sym->endpos.line);
+		newNumProp(node, "end-col", sym->endpos.pos);
+		if (sym->pos.stream != sym->endpos.stream)
+			xmlNewProp(node, "end-file", stream_name(sym->endpos.stream));
+        }
+	sym->aux = node;
+
+	idcount++;
+
+	return node;
+}
+
+static inline void examine_members(struct symbol_list *list, xmlNodePtr node)
+{
+	struct symbol *sym;
+	xmlNodePtr child;
+	char buf[256];
+
+	FOR_EACH_PTR(list, sym) {
+		examine_symbol(sym, node);
+	} END_FOR_EACH_PTR(sym);
+}
+
+static void examine_modifiers(struct symbol *sym, xmlNodePtr node)
+{
+	const char *modifiers[] = {
+			"auto",
+			"register",
+			"static",
+			"extern",
+			"const",
+			"volatile",
+			"signed",
+			"unsigned",
+			"char",
+			"short",
+			"long",
+			"long-long",
+			"typedef",
+			NULL,
+			NULL,
+			NULL,
+			NULL,
+			NULL,
+			"inline",
+			"addressable",
+			"nocast",
+			"noderef",
+			"accessed",
+			"toplevel",
+			"label",
+			"assigned",
+			"type-type",
+			"safe",
+			"user-type",
+			"force",
+			"explicitly-signed",
+			"bitwise"};
+
+	int i;
+
+	if (sym->namespace != NS_SYMBOL)
+		return;
+
+	/*iterate over the 32 bit bitfield*/
+	for (i=0; i < 32; i++) {
+		if ((sym->ctype.modifiers & 1<<i) && modifiers[i])
+			xmlNewProp(node, modifiers[i], "1");
+	}
+}
+
+static void
+examine_layout(struct symbol *sym, xmlNodePtr node)
+{
+	char buf[256];
+
+	examine_symbol_type(sym);
+
+	newNumProp(node, "bit-size", sym->bit_size);
+	newNumProp(node, "alignment", sym->ctype.alignment);
+	newNumProp(node, "offset", sym->offset);
+	if (is_bitfield_type(sym)) {
+		newNumProp(node, "bit-offset", sym->bit_offset);
+	}
+}
+
+static void examine_symbol(struct symbol *sym, xmlNodePtr node)
+{
+	xmlNodePtr child = NULL;
+	const char *base;
+	int array_size;
+	char buf[256];
+
+	if (!sym)
+		return;
+	if (sym->aux)		/*already visited */
+		return;
+
+	if (sym->ident && sym->ident->reserved)
+		return;
+
+	child = new_sym_node(sym, get_type_name(sym->type), node);
+	examine_modifiers(sym, child);
+	examine_layout(sym, child);
+
+	if (sym->ctype.base_type) {
+		if ((base = builtin_typename(sym->ctype.base_type)) == NULL) {
+			if (!sym->ctype.base_type->aux) {
+				examine_symbol(sym->ctype.base_type, root_node);
+			}
+			xmlNewProp(child, "base-type", 
+				xmlGetProp((xmlNodePtr)sym->ctype.base_type->aux, "id"));
+		} else {
+			xmlNewProp(child, "base-type-builtin", base);
+		}
+	}
+	if (sym->array_size) {
+		/* TODO: modify get_expression_value to give error return */
+		array_size = get_expression_value(sym->array_size);
+		newNumProp(child, "array-size", array_size);
+	}
+
+
+	switch (sym->type) {
+	case SYM_STRUCT:
+	case SYM_UNION:
+		examine_members(sym->symbol_list, child);
+		break;
+	case SYM_FN:
+		examine_members(sym->arguments, child);
+		break;
+	case SYM_UNINITIALIZED:
+		xmlNewProp(child, "base-type-builtin", builtin_typename(sym));
+		break;
+	}
+	return;
+}
+
+static struct position *get_expansion_end (struct token *token)
+{
+	struct token *p1, *p2;
+
+	for (p1=NULL, p2=NULL;
+	     !eof_token(token);
+	     p2 = p1, p1 = token, token = token->next);
+
+	if (p2)
+		return &(p2->pos);
+	else
+		return NULL;
+}
+
+static void examine_macro(struct symbol *sym, xmlNodePtr node)
+{
+	xmlNodePtr child;
+	struct position *pos;
+	char buf[256];
+
+	/* this should probably go in the main codebase*/
+	pos = get_expansion_end(sym->expansion);
+	if (pos)
+		sym->endpos = *pos;
+	else
+		sym->endpos = sym->pos;
+
+	child = new_sym_node(sym, "macro", node);
+}
+
+static void examine_namespace(struct symbol *sym)
+{
+	xmlChar *namespace_type = NULL;
+
+	if (sym->ident && sym->ident->reserved)
+		return;
+
+	switch(sym->namespace) {
+	case NS_MACRO:
+		examine_macro(sym, root_node);
+		break;
+	case NS_TYPEDEF:
+	case NS_STRUCT:
+	case NS_SYMBOL:
+		examine_symbol(sym, root_node);
+		break;
+	case NS_NONE:
+	case NS_LABEL:
+	case NS_ITERATOR:
+	case NS_UNDEF:
+	case NS_PREPROCESSOR:
+	case NS_KEYWORD:
+		break;
+	default:
+		die("Unrecognised namespace type %d",sym->namespace);
+	}
+
+}
+
+static int get_stream_id (const char *name)
+{
+	int i;
+	for (i=0; i<input_stream_nr; i++) {
+		if (strcmp(name, stream_name(i))==0)
+			return i;
+	}
+	return -1;
+}
+
+static inline void examine_symbol_list(const char *file, struct symbol_list *list)
+{
+	struct symbol *sym;
+	int stream_id = get_stream_id (file);
+
+	if (!list)
+		return;
+	FOR_EACH_PTR(list, sym) {
+		if (sym->pos.stream == stream_id)
+			examine_namespace(sym);
+	} END_FOR_EACH_PTR(sym);
+}
+
+int main(int argc, char **argv)
+{
+	struct string_list *filelist = NULL;
+	struct symbol_list *symlist = NULL;
+	char *file;
+
+	doc = xmlNewDoc("1.0");
+	root_node = xmlNewNode(NULL, "parse");
+	xmlDocSetRootElement(doc, root_node);
+
+/* - A DTD is probably unnecessary for something like this
+ 
+	dtd = xmlCreateIntSubset(doc, "parse", "http://www.kernel.org/pub/software/devel/sparse/parse.dtd"; NULL, "parse.dtd");
+
+	ns = xmlNewNs (root_node, "http://www.kernel.org/pub/software/devel/sparse/parse.dtd";, NULL);
+
+	xmlSetNs(root_node, ns);
+*/
+	symlist = sparse_initialize(argc, argv, &filelist);
+
+	FOR_EACH_PTR_NOTAG(filelist, file) {
+		examine_symbol_list(file, symlist);
+		sparse_keep_tokens(file);
+		examine_symbol_list(file, file_scope->symbols);
+		examine_symbol_list(file, global_scope->symbols);
+	} END_FOR_EACH_PTR_NOTAG(file);
+
+
+	xmlSaveFormatFileEnc("-", doc, "UTF-8", 1);
+	xmlFreeDoc(doc);
+	xmlCleanupParser();
+
+	return 0;
+}
+
diff --git a/parse.dtd b/parse.dtd
new file mode 100644
index 0000000..0cbd1b4
--- /dev/null
+++ b/parse.dtd
@@ -0,0 +1,48 @@
+<!ELEMENT parse (symbol+) >
+
+<!ELEMENT symbol (symbol*) >
+
+<!ATTLIST symbol type (uninitialized|preprocessor|basetype|node|pointer|function|array|struct|union|enum|typedef|typeof|member|bitfield|label|restrict|fouled|keyword|bad) #REQUIRED
+                 id ID #REQUIRED
+		 file CDATA #REQUIRED
+		 start CDATA #REQUIRED
+		 end CDATA #IMPLIED
+
+		 ident CDATA #IMPLIED
+		 base-type IDREF #IMPLIED
+		 base-type-builtin (char|signed char|unsigned char|short|signed short|unsigned short|int|signed int|unsigned int|signed long|long|unsigned long|long long|signed long long|unsigned long long|void|bool|string|float|double|long double|incomplete type|abstract int|abstract fp|label type|bad type) #IMPLIED
+
+		 array-size CDATA #IMPLIED
+
+		 bit-size CDATA #IMPLIED
+		 alignment CDATA #IMPLIED
+		 offset CDATA #IMPLIED
+		 bit-offset CDATA #IMPLIED
+
+		 auto (0|1) #IMPLIED
+		 register (0|1) #IMPLIED
+		 static (0|1) #IMPLIED
+		 extern (0|1) #IMPLIED
+		 const (0|1) #IMPLIED
+		 volatile (0|1) #IMPLIED
+		 signed (0|1) #IMPLIED
+		 unsigned (0|1) #IMPLIED
+		 char (0|1) #IMPLIED
+		 short (0|1) #IMPLIED
+		 long (0|1) #IMPLIED
+		 long-long (0|1) #IMPLIED
+		 typedef (0|1) #IMPLIED
+		 inline (0|1) #IMPLIED
+		 addressable (0|1) #IMPLIED
+		 nocast (0|1) #IMPLIED
+		 noderef (0|1) #IMPLIED
+		 accessed (0|1) #IMPLIED
+		 toplevel (0|1) #IMPLIED
+		 label (0|1) #IMPLIED
+		 assigned (0|1) #IMPLIED
+		 type-type (0|1) #IMPLIED
+		 safe (0|1) #IMPLIED
+		 usertype (0|1) #IMPLIED
+		 force (0|1) #IMPLIED
+		 explicitly-signed (0|1) #IMPLIED
+		 bitwise (0|1) #IMPLIED >
-- 
1.5.2-rc3.GIT

Reply via email to