I'm not entirely sure what changed but now it is indexing the word documents (sort of) with no errors. If I do a search for * all the information in the database is returned. I can see the word document URLs and what should be excerpts for each.
For each document referenced it gives "No Page Title Found" and something like this: "Read 8192 from document Read 8192 from document Read 8192 from document Read 8192 from document Read 8192 from document Read 2048 from document Read a total of 43008 bytes" for the excerpt. When I do a search for a specific word found within one of the word documents, my search returns no results. It looks like it is finding the file and noting the file size and what not but it is not parsing the document. Anyone have suggestions on this problem? -Trevor -----Original Message----- From: Gilles Detillieux [mailto:[EMAIL PROTECTED]] Sent: Friday, September 06, 2002 3:03 PM To: Wendt, Trevor Cc: [EMAIL PROTECTED] Subject: Re: [htdig] htdig & wp2html problems According to Wendt, Trevor: > Word Doc: > $od -b /export/home/htdig-3.1.6/scripts/doc2html/IntranetROI.doc | head -1 > 0000000 320 317 021 340 241 261 032 341 000 000 000 000 000 000 000 000 > > Looks like the magic numbers match when it's on the local box (which is > solaris) but the file itself is located on an NT/IIS 4.0 box. I didn't think > that would cause a problem but for kicks I downloaded hod, a nice little > octal dump program for windows, and the dump output matches on NT as well. > > Since the Word RTF is working, here's the od output from it. > RTF Doc: > $ od -b /export/home/htdig-3.1.6/scripts/doc2html/IntranetROI_wo*.doc | head > -1 > 0000000 173 134 162 164 146 061 134 141 156 163 151 134 141 156 163 151 > > As of now, I have not modified anything in my doc2html.pl file since my last > email. > > Any other ideas? I do appreciate all the help! I'm afraid I'm stumped. Hopefully David or someone else more familiar with doc2html than me can think of something I've misseed. Do you get exactly the same error, and no other potentially useful error messages, when you run doc2html.pl manually from the command line on one of these Word documents? -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/ Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada) ------------------------------------------------------- This sf.net email is sponsored by: OSDN - Tired of that same old cell phone? Get a new here for FREE! https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390 _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

