[htdig] Using htdig to "tidy" HTML

Rzepa, Henry Mon, 05 Jun 2000 06:27:28 -0700
We, along with the rest of the world, need to think about how to migrate
document collections to  XHTML. 

As an adjunct to our work with external parsers in htdig, which we use to
extract meta information from external file types (e.g. gif, vrml, svg, xml
and a whole host of chemical types) we thought it would be useful to try
to add the option of creating on the fly XHTML versions of each document
retrieved by  htdig from the  start_url directory. This can be done simply
using  Dave Raggett's program Tidy, which seems pretty reliable  (if not
always 100%).  However, invoking  Tidy seems to require it be defined
in conjunction with an external parser for the MIME type  text/html.
This means entirely over-riding the internal  text/html htdig parser.

Does anyone have any idea how to invoke both? I.e, the internal parser
to index the content of the html file, and also an external parser to 
convert it on the fly to xhtml?  (and  before someone asks, no we
do not intend this to be done for large document collections,
since  I suspect the process will be a slow one). 
-- 

Henry Rzepa. +44 (0)20 7594 5774 (Office) +44 (0)20 7594 5804 (Fax)
Dept. Chemistry, Imperial College, London, SW7  2AY, UK. 
http://www.ch.ic.ac.uk/rzepa/

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.
[htdig] Using htdig to "tidy" HTML

Reply via email to