I'm not entirely sure what changed but now it is indexing the word documents
(sort of) with no errors.  If I do a search for * all the information in the
database is returned.  I can see the word document URLs and what should be
excerpts for each.

For each document referenced it gives "No Page Title Found" and something
like this: "Read 8192 from document Read 8192 from document Read 8192 from
document Read 8192 from document Read 8192 from document Read 2048 from
document Read a total of 43008 bytes" for the excerpt. When I do a search
for a specific word found within one of the word documents, my search
returns no results. It looks like it is finding the file and noting the file
size and what not but it is not parsing the document. 

Anyone have suggestions on this problem?

-Trevor



-----Original Message-----
From: Gilles Detillieux [mailto:[EMAIL PROTECTED]]
Sent: Friday, September 06, 2002 3:03 PM
To: Wendt, Trevor
Cc: [EMAIL PROTECTED]
Subject: Re: [htdig] htdig & wp2html problems


According to Wendt, Trevor:
> Word Doc:
> $od -b /export/home/htdig-3.1.6/scripts/doc2html/IntranetROI.doc | head -1
> 0000000 320 317 021 340 241 261 032 341 000 000 000 000 000 000 000 000
> 
> Looks like the magic numbers match when it's on the local box (which is
> solaris) but the file itself is located on an NT/IIS 4.0 box. I didn't
think
> that would cause a problem but for kicks I downloaded hod, a nice little
> octal dump program for windows, and the dump output matches on NT as well.

> 
> Since the Word RTF is working, here's the od output from it. 
> RTF Doc: 
> $ od -b /export/home/htdig-3.1.6/scripts/doc2html/IntranetROI_wo*.doc |
head
> -1
> 0000000 173 134 162 164 146 061 134 141 156 163 151 134 141 156 163 151
> 
> As of now, I have not modified anything in my doc2html.pl file since my
last
> email. 
> 
> Any other ideas? I do appreciate all the help! 

I'm afraid I'm stumped.  Hopefully David or someone else more familiar
with doc2html than me can think of something I've misseed.

Do you get exactly the same error, and no other potentially useful error
messages, when you run doc2html.pl manually from the command line on one
of these Word documents?

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba  Winnipeg, MB  R3E 3J7  (Canada)


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone?  Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to