Geoff Hutchison wrote:
> 
> On Mon, 5 Jun 2000, Rzepa, Henry wrote:
> 
> > retrieved by  htdig from the  start_url directory. This can be done simply
> > using  Dave Raggett's program Tidy, which seems pretty reliable  (if not
> > always 100%).  However, invoking  Tidy seems to require it be defined
> > in conjunction with an external parser for the MIME type  text/html.
> > This means entirely over-riding the internal  text/html htdig parser.
> 
> Alas the problem here is that invoking the external converter feature
> would produce an infinite loop. Setting text/html -> text/html would just
> call the converter again. :-( [The feature here is that you might have a
> converter to gunzip files which then produces PDF files to go to another
> converter.]
> 
> I guess the ExternalParser code could be changed so that a converter
> producing text/plain or text/html (or any future internal mime-types)
> passes it off to the internal code.
> 
> That said, I thought there was some sort of command-line tool to "spider"
> with Tidy already. Maybe that was something dreamed up by one of my
> friends at school. Still, it seems like a shell script around Tidy would
> be better.

I only can think of a two-step process here, which has Ht://Dig produce
a URL logfile which is piped through sort | uniq and fed to the tidy
pro-
gram afterwards.  A simple shell script which serves as an extension to
the rundig script should do.


cheers,

  Torsten

-- 
InWise - Wirtschaftlich-Wissenschaftlicher Internet Service GmbH
Waldhofstra�e 14                            Tel: +49-4101-403605
D-25474 Ellerbek                            Fax: +49-4101-403606
E-Mail: [EMAIL PROTECTED]            Internet: http://www.inwise.de

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.

Reply via email to