WElcher pdf-Parser steht in letzter Nachricht
> Message: 1 > From: Gilles Detillieux <[EMAIL PROTECTED]> > Subject: Re: [htdig] PDF text search > To: [EMAIL PROTECTED] (Brett Simpson) > Date: Thu, 6 Dec 2001 15:09:41 -0600 (CST) > Cc: [EMAIL PROTECTED] > > According to Brett Simpson: > > I'm currently using the RPM version of Htdig 3.2.0-1.b3.6 on Redhat 7.2 > > with apache. What do I need to perform text searching of pdf files? If > > I copy a pdf file into /var/www/html and run "htdig -iv" it lists the > > pdf file as not Parsable. I am able to do a search with the default > > stuff that comes loaded in /var/www/html. Do I need to add some sort > > of external parser? Does anyone know of any? Thanks. > > I recommend doc2html.pl. See http://www.htdig.org/FAQ.html#q4.9 > > In any case, htdig 3.2.0b3 is pretty buggy, so I'd recommend getting > the latest update rpms from Red Hat or a local mirror site, i.e. > > apache-1.3.22-2 > htdig-3.2.0-1.b4.0.72 > htdig-web-3.2.0-1.b4.0.72 > > Red Hat users should, as a rule, keep up to date on the latest errata > at http://www.redhat.com/apps/support/errata/ > > It helps if you subscribe to their redhat-watch mailing list, so you > get the advisories by e-mail when the updates come out. > > -- > Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> > Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil > Dept. Physiology, U. of Manitoba Phone: (204)789-3766 > Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 > > --__--__-- > Message: 5 > Date: Thu, 06 Dec 2001 23:42:02 +0100 > To: Geoff Hutchison <[EMAIL PROTECTED]>, > "B.G. Mahesh" <[EMAIL PROTECTED]> > From: Olivier Korn <[EMAIL PROTECTED]> > Subject: Re: [htdig] How to process htdig logfile? > Cc: [EMAIL PROTECTED] > > At 15:37 06/12/2001 -0500, Geoff Hutchison wrote: > >On Thu, 6 Dec 2001, B.G. Mahesh wrote: > > > > > Is there any htdig-log processing utility like webalizer [mrunix.net]? I > > > >Not to my knowledge. I believe one issue is that the htsearch log output > >isn't in a form that webalizer et al find easy to parse. This is an issue > >we'd like to correct, but it's pretty far down the TODO list unless > >someone contributes something in this direction. > > There is a patch for ht://Dig 3.1.5 which modify the way htsearch log > output is written. I found it rather useful (but I don't use any kind of > log processing utility. Sorry). > > Just in case it might help... :-) > > -- > Olivier Korn, > Strasbourg - France > > --__--__-- > --__--__-- > > Message: 10 > From: Gilles Detillieux <[EMAIL PROTECTED]> > Subject: Re: [htdig] PDF text search > To: [EMAIL PROTECTED] (Brett Simpson) > Date: Thu, 6 Dec 2001 17:15:42 -0600 (CST) > Cc: [EMAIL PROTECTED] (ht://Dig mailing list) > > According to Brett Simpson: > > I updated to the latest rpms from redhat and got conv_doc.pl to > > work. I'm going to give doc2html.pl a try. > > I use conv_doc.pl myself and am happy with it. I only recommend > doc2html.pl because it's more configurable, and has hooks for more > document types, so it's more generally useful. However if conv_doc.pl > meets all your external converter needs, you probably don't need to > switch. > > -- > Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> > Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil > Dept. Physiology, U. of Manitoba Phone: (204)789-3766 > Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 > > --__--__-- _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

