If you do use Word on the client (often a good choice, often not), I recommend using Word 2000 as it produces more complete HTML. In addition, as others have suggested, I would post-process the grisly markup produced by Word into something manageable. A good choice for such work is my pet project HTML Tidy. See http://tidy.sourceforge.net for general info and http://tidy.sourceforge.net/docs/quickref.html#word-2000 for details on Word de-munging.

Tidy works great for cleaning up fragments produced by browser based editors as well. See http://tidy.sourceforge.net/docs/quickref.html#show-body-only

I think we will see a great deal of evolution and deployments in the XML editor space in the next couple years. There are a number of decent offerings and Word 11 is coming. Now comes the hard part: we all have to design decent XML schemas <g>. Luckily, this is also the fun part.

take it easy,
Charles Reitzel




At 12:39 PM 12/4/2002 -0800, Iva Koberg wrote:
<snip>
<?xml version="1.0"?>
<Doc>
 <Title><Bold><Font RelSize="+2">My Heading</Font></Bold></Title>
 <Para LineBreak="no" Align="left" Empty="Y"></Para>
 <Para LineBreak="no" Align="left">This is my test paragraph.  How about
<Underline>underline</Underline>, <Italics>italics</Italics>, and
<Bold>bold</Bold>?</Para>
 <Para LineBreak="no" Align="left" Empty="Y"></Para>
 <Para LineBreak="no" Align="left">This is a bulleted list</Para>
 <List>
  <Item>My first bullet</Item>
  <Item>And the second bullet</Item>
  <Item>Last bullet</Item>
 </List>
</Doc>
</snip>

The above XML is not more useful than good old HTML - it mixes content
and presentation, it provides no meaningful description of the content!
What does <bold> mean? How can this content be reused in the future? How
can a semantic query determine what is in <italics>? Is <italics>
telling me I'd find a citation inside, a reference, an author's name?
You are not semantically describing the content, you're repackaging HTML
into different tag names. You're not managing your content any better.



<snip>
<span class="content-heading">My Heading</span>
<br clear="all">
<p class="content-text" align="left">&nbsp;</p>
</snip>

By the way, this HTML and CSS is quite invalid (very much like what Word
produces ;)


<snip>
There's no sense in trying to beat Word as an authoring tool.  I've seen
a
lot of WYSIWYG authoring tools, and they all fall short.  Why go through
the
trouble
</snip>

A pencil and paper is a very widely used content authoring tool as well,
but the problem is that content authored that way can't be reused. Same
goes for Word - companies are spending millions to try and salvage
information out of Word documents. And what are they finding out? That
it can't be reliably done programmatically because the goal is to
transform to meaningful semantic markup and Word does not mark up the
content semantically. That makes it very worth the trouble to create a
better tool for authoring content IMO :)

best,
Iva


--
http://cms-list.org/
trim your replies for good karma.
--
http://cms-list.org/
trim your replies for good karma.

Reply via email to