Extracting DTD information

coding . meister Thu, 20 Apr 2006 17:47:48 -0700

Hello,

I'm trying to user Xerces C++ 2.7 to read information from a DTD. The
code below reads in the following entry in a DTD: <!ELEMENT   a (x, y,
z)>


// Load the DTD
XercesDOMParser parser;
DTDGrammar* grammar =3D (DTDGrammar*) parser.loadGrammar(dtdPath,
Grammar::DTDGrammarType);

// Get a specific DTD element
DTDElementDecl* elementDecl =3D (DTDElementDecl*)
grammar->getElemDecl(0, nil, objName, 0); // objName is "a"

// Get information about the children of this element
ContentSpecNode* specNode =3D elementDecl->getContentSpec();
DFAContentModel contentModel(true, specNode);

// Question 1: Whey does this return nil
ContentLeafNameTypeVector* content =3D
contentModel.getContentLeafNameTypeVector();

// Question 2: Why does this line return (child count + 1), in this case 4
int leafCount =3D content->getLeafCount();

Question 1: The only way I can see that I can access the list of
children, their names and types (exactly one, zero or one, zero or
more, one or more, etc) is to use getContentLeafNameTypeVector. This
unfortunately returns nil. Looking at DFAContentModel::buildDFA line
471 is the reason nil is returned:

if ( (fLeafListType[outIndex] & 0x0f) !=3D ContentSpecNode::Leaf )

I'm assuming that ContentSpecNode::Leaf is for exactly once. So the
net effect of this line is that if every child is a leaf node, then
the ContentLeafNameTypeVector isn't populated. Why?

Question 2: If I patch the code to ignore that check, the next problem
is that the count returned is (child count + 1), so for the DTD
element listed above, 4 is returned. The comments in the code indicate
that the last node is the end of content (EOC) node that is needed for
the implementation of DFAContentModel to "get rid of any repetition
short cuts". Can I assume that I will alway get child count + 1?

Cheers!

Extracting DTD information

Reply via email to