Hello Tim,


Did you take a look at

http://www.htdig.org/attrs.html#external_parsers

You external parser has to accept the parameters described there, so in fact
you have to write a shell wrapper for it. I attached my parser scripts so you
can instantly use them.

I use for

doc2html:       wvware (wvware.sourceforge.net) which is really powerful
pdf2html:       pdftohtml (not xpdf, but a self patched version of pdfhtml
                with xpdf3 libraries so I can parse PDF 1.5)
ppt2html:       ppthtml
xls2html:       xlhtml

Just modify the variables in there so you have the proper locations.
I do a kind of extensive logging to have separate logfiles for each parser,
so I can determine the documents which could not be converted.
(Just in case users ask :).

Yours,

Martin



On Wed, Feb 25, 2004 at 05:22:46PM -0500, Tim Cleary wrote:
> Thanks for everyone's suggestions on my problem yesterday.
> 
> A new one:
> I am running into trouble with external conversion- it is not working.
> 
> Basically I have 3 types of files I want to convert - MS Excel, MS
> Powerpoint, and PDF.  I have installed a utility for each in /usr/local/bin:
> xlhtml for excel, ppthtml for powerpoint, pdftohtml for pdf.  Each generates
> standard output to the screen just fine when called from the command line,
> and when output is directed to a file, it is created as "text/html" so I
> thought that it would work to have them tagged as external converters via
> htdig.conf.  The htdig.conf file is as follows:
> ....
> external_parsers:    application/vnd.ms-excel->text/html
> /usr/local/bin/xlhtml \
> application/vnd.ms-powerpoint->text/html /usr/local/bin/ppthtml \
> #                  application/pdf->text/html "/usr/local/bin/pdftohtml
> -noframes -I -stdout"
> ...
> 
> On htdig run through rundig, I get a header-line input that says
> "content-type: application/vnd.ms-powerpoint, not HTML" and then it moves
> onto the next item.  It doesn't even work for the pdf.
> 
> Then for each file I get a "deleted, no excerpt" when it goes to merge.
> 
> I feel like I am following the formatting correctly.  I have tried different
> versions of the application type (msword, ms.word, doc, etc.).  I am running
> OS X so these were the specific application types it listed (using file -i).
> 
> Thanks for any suggestions.
> 
> Tim Cleary
> 
> -- 
> Tim Cleary
> Manager
> Dean & Company
> (703) 760-4375
> [EMAIL PROTECTED]
> 
> 
> 
> 
> -------------------------------------------------------
> SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> Build and deploy apps & Web services for Linux with
> a free DVD software kit from IBM. Click Now!
> http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
> _______________________________________________
> ht://Dig general mailing list: <[EMAIL PROTECTED]>
> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
> List information (subscribe/unsubscribe, etc.)
> https://lists.sourceforge.net/lists/listinfo/htdig-general

-- 

--------------------------------------------------------
 arago AG, Institut fuer komplexes Datenmanagement
 Am Niddatal 3, 60488 Frankfurt/Main, [EMAIL PROTECTED]
 Tel. 069/405680, Fax 069/40568111, http://www.arago.de
--------------------------------------------------------

Attachment: htdig_parsers.tgz
Description: application/tar-gz

Attachment: pgp00000.pgp
Description: PGP signature

Reply via email to