----- Original Message ----- From: "F. Spitzer, GEOSYSTEMS" <[EMAIL PROTECTED]>
To: "David Adams" <[EMAIL PROTECTED]>
Sent: Wednesday, January 19, 2005 11:08 AM
Subject: Re: [htdig] Indexing large Powerpoints



Hi David,

thanks for your answer. You were right. Changing the value in doc2html.pl did solve the problem! Great.

Buy the way: I am working on Suse 9.2 and I was able to index ppts with over 90 MB.

Thanks a lot!!

Cheers Fritz


Mit freundlichen GrÃÃen

Fritz Spitzer
Schulungsleitung und Systemintegration

--------------------------------------------------------------------
GEOSYSTEMS GmbH
RiesstraÃe 10, D-82110 Germering, GERMANY
www.geosystems.de

E: [EMAIL PROTECTED]
T: +49-(0)89-89 43 43 -0 (Ext. -20)
F: +49-(0)89-89 43 43 99

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Abonnieren Sie unseren Newsletter, um immer auf dem Laufenden zu sein:
www.geosystems.de/newsletter

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +








David Adams schrieb:
How are you using htdig to index .ppt files? Recent versions of doc2html.pl have a default input limit of 20Mbytes and will not try to convert files any larger. Just increase the limit in the doc2html.pl script.

I have found that ppthtml 0.4 from www.xlhtml.org (now relocated to http://chicago.sourceforge.net/xlhtml), which is what I use, does not always succeed in extracting text after the first embedded image.

I have not found problems with ppthtml on RedHat Linux, but on Solaris the process size could be very large. With >20Mbytes .ppt files I doubt if it would run.

David Adams
Corporate Information Services
Information Systems Services
University of Southampton

----- Original Message ----- From: "F. Spitzer, GEOSYSTEMS" <[EMAIL PROTECTED]>
To: <htdig-general@lists.sourceforge.net>
Sent: Wednesday, January 19, 2005 6:48 AM
Subject: [htdig] Indexing large Powerpoints



Good morning List!

I have one problem to solve. Maybe you can help me?

We have a huge (more than 250) Powerpoint collection. So I want htdig to build up an index, allowing the users to search for keywords.

Things are working so far. Htdig does itâs job quite well. The only problem that I still have consists with ppt-files larger than 20 MB. Unfortunately nearly 50% of the files are larger than 20 MB.

I set max_doc_size to 80000000 (80MB, this is the largest ppt). But running htdig will produce the following output: Input file size of 45956608 at or above 20000000 limit.
For me it seems, that there is an other limitation of htdig, that ignores the value set by max_doc_size.


How can I overcome this limitation?

I though about writing a shell script that does the conversion of ppt to html before running htdig. Htdig will than use the html files for building up the index. Using url_part_aliases during db creation and during the search will replace the html-doc location to the original ppt location.

Has anybody did this before? Ore even better is there an other solution for my problem.

Thanks a lot for you help. Any hints are welcome.

Cheers Fritz

Fritz Spitzer
Schulungsleitung und Systemintegration

--------------------------------------------------------------------
GEOSYSTEMS GmbH
RiesstraÃe 10, D-82110 Germering, GERMANY
www.geosystems.de

E: [EMAIL PROTECTED]
T: +49-(0)89-89 43 43 -0 (Ext. -20)
F: +49-(0)89-89 43 43 99

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Abonnieren Sie unseren Newsletter, um immer auf dem Laufenden zu sein:
www.geosystems.de/newsletter

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +









-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
ht://Dig general mailing list: <htdig-general@lists.sourceforge.net>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general








------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ ht://Dig general mailing list: <htdig-general@lists.sourceforge.net> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to