Hi Piotr,

In my previous posting I showed an example of merging DP with XML data. In this 
post, I will describe the XML data structures, and how to prepare for using it 
with Word 2003+.

My DP Application has two main panels
    Customers
    Articles
it also had one other dummy panel with one record which I always employ for 
housing reports, I call it Configuration for want of a better name.

Secondly I mapped out a rough idea of the structure of the XML file I wanted to 
create.:
    <dpnews>
        <month>June</month>
        <customer>
            <company>Acme Mouse Trap Company</company>
            <contact>Sylvester Felis</contact>
            <!-- etc etc -->
        </customer>
        <article>
            <heading>DP Does XML<heading>
            <content>>Mauris ullamcorper metus eu arcu? Donec 
pellentesque!<content>
        <article>
        <article>
            <!-- etc etc -->
            .
            .
        </article>
        <!-- etc etc -->
    </dpnews>

As it is for a personalised newsletter application instead of this structure I 
could have used something more like this:
    <dpnews>
        <month><month>
        <customer>
            <article></article>
            <article<</article>
        </customer>
    </dpnews>

where the articles were wrapped up in the customer element. This would be a 
little easier to work with in  Word, however I expected that more than one 
customer might one the same newsletter so I left it will the possibility of 
adding multiple customers for the same set of articles - eventually I could 
have a structure like:

    <dpnews>
        <month><month>
        <customer></customer>
        <customer></customer>
        <article></article>
        <article></article>
        <article></article>
    </dpnews>

Instead of putting "month" in as a separate element, I could have done 
something like:

<dpnews month="June"> 

Which would have been adding "month" as an ATTRIBUTE of the "dpnews" ELEMENT, 
however it would have made life very difficult creating the template in Word. 

Creating the XML document in DP really is a piece of cake, far easier than say 
with most XML capable database products.  Other products might be smarter but 
they usually have their own way of doing things, DP can work with XML with 
almost the same absolute and total flexibility of using Notepad, after all XML 
is only a text file. As an aside the whole project took about an hour or so to 
complete, however I spent at least 2 hours debugging a small problem, which I 
eventually tracked down to me forgetting to set the margins and page lengths in 
the DP report to zero

Before Word 2003 can create a template document for this XML document it has to 
understand its structure. If it is a very simple structure similar to say a 
standard mail merge file, where it is essentially a flat file, Word can deduce 
the structure and life is really quite easy. For anything more that this and 
you will need to create a Schema . This is about the hardest part of the 
project, you need to get this right. I am a bit of a weakling at writing 
Schemas, so I take a more simple approach. I let other (free) software do it 
for me.

Schemas can be pretty powerful as it describes how the data is structured and 
the rules for the content. It can specify thing like the type of data that will 
be found in an element, etc etc. Complex Schemas give me a headache.  If I let 
Schema creation software analyse the DP XML file, even though it is very simple 
I get a very complex Schema, and thats going to give me a headache, and not 
necessairly just me, it will possibly give Microsoft Word 2003 something of a 
headache.  Very fortunately before Schemas came out there was a much simple way 
of describing simple XML documents, called a DTD, (or Document Type 
Definition). DTD was far more basic and less likely to give me a headache, but 
unfortunately Word cannot use a DTD. However since a DTD contains more basic 
information than say a sample of my DP XML file, Schemas software which can 
usually convert a DTD into a much simpler Schema which with just a little 
modification, is ideal for Word.

I used some freeware software WMHelp XMLPad 3 to create the DTD, which looked 
like this:
<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT dpnews (month, customer, article+)>
<!ELEMENT month (#PCDATA)>
<!ELEMENT customer (company+, contact?, phone?, email?, fax?)>
<!ELEMENT company (#PCDATA)>
<!ELEMENT contact (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
<!ELEMENT email (#PCDATA)>
<!ELEMENT fax (#PCDATA)>
<!ELEMENT article (heading, content)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT content (#PCDATA | br | b | i | u)*>
<!ELEMENT br EMPTY>
<!ELEMENT b (#PCDATA | i)*>

<!ELEMENT b (#PCDATA |u)*>
<!ELEMENT i (#PCDATA | b)*>

Just a super quick overview of this, the element "dpnews" can contain one of 
element "month", one of element "customer" and one or more elements "article" 
(the one or more signified by the "+" sign at the end). The "month" element can 
contain text (#PCDATA). The "customer" element can contain one or more 
"company" elements ("+") and optionally ie zero or one "contact" elements 
(signified by the "?").

The interesting one, which might need hand coding is the element "content" 
which can contain zero, one or more of the elements "br" or "b" or "i" or "u". 
Since my content comes from a AxAy Memotype field and their can be line breaks, 
italics or underline characters, I want to include them in the specification of 
the structure. Note that I have specify the br element as an EMPTY as in <br 
/>, and I have had to specify the "b" and "i" or "u" elements as potentially 
including each other.  I could have gone further by letting bold or underlined 
text include a line break, but I didn't.

By the way, to create the <br /> or <i> or <u> or <b> elements you will need to 
use the latest DP2.6Y version of the software, in either the U or I versions 
(note that I added the specification for both "u" and "i" elements to the DTD.  

Line breaks are easy to handle, but the other text attributes must be correctly 
nested, so unless it is imperative to the application then I would avoid 
anything but a line break, as DP does not enforced well-formed code so it up to 
the person editing the AxAy memo field to get it right. Which is the reaon I 
left it out of the DTD.

After this I used the same XMLPad software to create a  a Schema from the DTD. 
The Schema is implemented as an XML file usually with the file extension .XSD

The Schema XMLPad created looks like this:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"; xmlns="DPNEWS.dtd" 
xmlns:wmh="http://www.wmhelp.com/2003/eGenerator"; 
elementFormDefault="qualified" targetNamespace="DPNEWS.dtd">
  <xs:element name="dpnews">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="month"/>
        <xs:element ref="customer"/>
        <xs:element ref="article" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="month" type="xs:string"/>
  <xs:element name="customer">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="company" maxOccurs="unbounded"/>
        <xs:element ref="contact" minOccurs="0"/>
        <xs:element ref="phone" minOccurs="0"/>
        <xs:element ref="email" minOccurs="0"/>
        <xs:element ref="fax" minOccurs="0"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="company" type="xs:string"/>
  <xs:element name="contact" type="xs:string"/>
  <xs:element name="phone" type="xs:string"/>
  <xs:element name="email" type="xs:string"/>
  <xs:element name="fax" type="xs:string"/>
  <xs:element name="article">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="heading"/>
        <xs:element ref="content"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
  <xs:element name="heading" type="xs:string"/>
  <xs:element name="content">
    <xs:complexType mixed="true">
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:element ref="br"/>
        <xs:element ref="b"/>
        <xs:element ref="i"/>
        <xs:element ref="u"/>
      </xs:choice>
    </xs:complexType>
  </xs:element>
  <xs:element name="br">
    <xs:complexType/>
  </xs:element>
  <xs:element name="b">
    <xs:complexType mixed="true">
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:element ref="i"/>
      </xs:choice>
    </xs:complexType>
  </xs:element>
  <xs:element name="i">
    <xs:complexType mixed="true">
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:element ref="b"/>
      </xs:choice>
    </xs:complexType>
  </xs:element>
  <xs:element name="u">
    <xs:complexType mixed="true">
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:element ref="b"/>
      </xs:choice>
    </xs:complexType>
  </xs:element>
</xs:schema>

Everything is good in this file except for the processing instruction starting 
on the 2nd line:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"; 
xmlns="DPNEWS-schema3.dtd" xmlns:wmh="http://www.wmhelp.com/2003/eGenerator"; 
elementFormDefault="qualified" targetNamespace="DPNEWS-schema3.dtd">


In fact I hand edited this, so it looks like:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"; xmlns="DPNews" 
targetNamespace="DPNews">

(a lot simpler)

Namespaces  indicated by the xmlns="DPNews" and targetNamespace attributes are 
quite complex, but in a nutshell they allow you to combine XML files from 
different sources, and different parts of the resultant document can be 
identified by the namespace.  Because Word uses a large range of Namespaces, it 
needs to be able to identify which part of the markup belong to data from the 
DPNews file, and which is from the Word markup. Remember though to  save the 
new Schema).   

Namespaces usually have a prefix so in the XML Word produces you will see in 
the root element:

<w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml"; 
xmlns:v="urn:schemas-microsoft-com:vml" etc etc

and throughout the document you will see tags like <w:tblBorders> In this case 
there is a namespace prefix "w" so <w:tblBorders> means it belongs to the 
namespace http://schemas.microsoft.com/office/word/2003/wordml

In our DP XML we want all the data to belong to the same namespace so we leave 
out a prefix and the namespace will be the default namespace. To add a 
namespace to our XML we need to revisit the DP report creating the XML data 
source and add a default namespace to that document, I will also add an XML 
prolog to make things a little more formal so the first few line of the XML 
look like this:

<?xml version="1.0" encoding="ISO-8859-1"?>
<dpnews xmlns="DPNews">
  <month>June</month>
  .
    .
    etc

Ok, now we have a DP application which creates an XML file, which has a 
uniquely named Namespace "DPNews" (often namespaces use as URL, so I might 
create a URL like www.brileigh.com/DPNews so that it is not confused with 
another users DPNews Schema. But for this exercise the "DPNews" is enough. (If 
you do use a URL it does not have to lead to a webpage, the page is never 
accessed. Sometimes however if you are using document that other will use, you 
might actually have a valid address to people can look up information about the 
document and how it is used. 

At this point we have an XML data document and an Schema which can be added to 
Word library of schemas. Ok so back to Word 2003, and potentially our template 
document.

After creating a blank document or using a document which we want to convert 
into a template, on the Tool Menu in Word 2003, in "Templates and Add-Ins" in 
the "XML Schema" tab, after clicking the "Add Schema" button, browse and select 
the Schema file you created. It will be added to the library of Available XML 
Schemas, and check the one you just added.  Before leaving the "XML Schema" Tab 
we need to do one more thing and that is to check the box "Allow saving as XML 
even if not valid". This is important because we are not using Word to create 
the XML data source, but rather to create a document which might be subset of 
fields from the DP XML file, and which will unlikely be valid according to our 
DPNews Schema.
 
One last thing before marking up our template. If the Task Pane is not visible 
you will need to display it, and select the "XML Structure" pane. This will 
show the structure.   Check the box "Show XML tags in the document" otherwise 
you will not see what you are doing. It might also be easier for complex data 
structures to check the box "List only Child Elements of the Current element".

We are now ready to create the Template document. We can make many different 
templates from the same schema, and we do not need to go through any of the 
above process again, unless we change our structure.

In the next part I will show how the Word 2003 template is created and how 
markup is added. 
You can download the DTD and Schema I used from this link 
http://www.brileigh.com/DPWeb/DPNewsPart2.zip 

Regards
Brian
_______________________________________________
Dataperf mailing list
[email protected]
http://lists.dataperfect.nl/mailman/listinfo/dataperf

Reply via email to