There are two possible causes for the "UNABLE to convert" message, and you
should be able to find out which it is.  Either doc2html.pl is failing to
match the magic number and MIME type and so is not calling wp2html, or it is
calling wp2html and wp2html is not producing any output.

--
David Adams
Computing Services
Southampton University


----- Original Message -----
From: "Wendt, Trevor" <[EMAIL PROTECTED]>
To: "'David Adams'" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Wednesday, September 11, 2002 4:54 PM
Subject: RE: [htdig] htdig & wp2html problems


> The magic numbers of the file are these:  0000000 320 317 021 340 241 261
> 032 341 000 000 000 000 000 000 000 000
> They match what is listed for the msword option in doc2html.pl and the
mime
> type is correct for the file as well (application/msword). The word file
I'm
> trying to access is MS Word 2000.  I have also tried to parse and index
Word
> 95, 97&2000, and assorted other file types. The only one that seems to
work
> is the word rtf files. All the files I've tried are pc based, and produce
> the ! UNABLE to convert error as a result.
>
> Could this be some type of permissions problem? Currently all the doc2*.pl
> files have 775 permissions (-rwxrwxr-x). The wp2html is the same, and the
> .cfg and .sty are 664 (-rw-rw-r--). The file I'm trying to access locally
is
> set to 644 (-rw-r--r--).
>
> I'm at a total loss and now three of us are stumped. I'm going to try and
> bypass doc2.html and get wp2html to work directly with dig... if that
> doesn't work, I guess I'll install catdoc and see if I have any better
luck
> with that integration.
>
> Again, thanks for the help and if anyone has further suggestions I'd be
glad
> to hear them! Thanks!
>
> - Trevor
>
>
> -----Original Message-----
> From: David Adams [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, September 11, 2002 5:47 AM
> To: Wendt, Trevor; 'David Adams '
> Cc: [EMAIL PROTECTED]
> Subject: Re: [htdig] htdig & wp2html problems
>
>
> Your doc2html.pl file looks fine.
>
> Do you know what version of Word file IntranetROI.doc is, or what its
magic
> number might be?
>
> As stated in the doc2html DETAILS file, wp2html is unable to convert
Word2,
> Word6 or Word for MAC files.
> That is why catdoc is required as a fall-back, it does not do such a good
> job as wp2html but it can cope with all the early Word file formats.
>
> If that is not the problem then I'm stumped.
>
> Catdoc has moved, and can be found at http://www.ice.ru/~vitus/catdoc/,
> however it is still free, unlike wp2html for which there is a small
charge.
>
> --
> David Adams
> Computing Services
> Southampton University
>
>
> ----- Original Message -----
> From: "Wendt, Trevor" <[EMAIL PROTECTED]>
> To: "'David Adams '"
> <[EMAIL PROTECTED]>
> Cc: <[EMAIL PROTECTED]>
> Sent: Tuesday, September 10, 2002 8:13 PM
> Subject: FW: [htdig] htdig & wp2html problems
>
>
> > Got the other problem fixed... it was a file permission problem.
> >
> > Running from the command line, I'm still not getting a successful parse
> > through rundig or doc2html... only with wp2html. (I was not using
> > doc2html.cfg and doc2html.sty with wp2html like the instructions stated
so
> I
> > made that change - the output looks better.)
> >
> > The error I get from doc2html.pl is "Can't open file IntranetROI.doc"
> (seen
> > below). Is there a Verbose option I can set in hopes of getting a better
> > error output or any suggestions on why this is happening?
> >
> > The error I'm still getting from rundig is "!       UNABLE to convert"
> (seen
> > below).
> >
> > I've attached the doc2html.pl(.txt) file I'm using again. It's the
default
> > one from the htdig contrib section, minus the wp2html path change, so
I'm
> > pretty sure it's setup correctly.
> >
> > This is turning into a real challenge and I'm not planning on giving up
> > quickly. All help is greatly appreciated.
> > Thanks!
> >
> > - Trevor
> >
> >
> > ################################################################
> > ### RUNNING WP2HTML PARSER FROM COMMAND LINE:
> > ################################################################
> > $ /<mypath>/wp2html -i IntranetROI.doc -c /<mypath>/doc2html.cfg -s
> > /<mypath>/doc2html.sty
> > <--            Wp2Html Version 3.3d             -->
> > <~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~>
> >             Registered Copy
> > <__________________________________________________>
> >
> > ------> Input will be read from file IntranetROI
> > ------> Using configuration file /<mypath>/doc2html.cfg
> >
> > ---> Updating the entry for HeadCell
> > ------> Using user styles file /<mypath>/doc2html.sty
> > ------> Output will be written to file IntranetROI.html
> > $
> >
> >
> > ################################################################
> > ### RUNNING DOC2HTML FROM COMMAND LINE:
> > ################################################################
> > $ doc2html.pl IntranetROI.doc application/msword
> > Can't open file IntranetROI.doc
> >
> >
> > ################################################################
> > ### RUNNING RUNDIG FROM COMMAND LINE:
> > ################################################################
> > $ rundig -c ../conf/my.conf
> > !       UNABLE to convert
> >
> > If I run "rundig -vvvv -c ../conf/my.conf" this is the output that I get
> > concerning the IntranetROI.doc I'm using to test with.
> >
> > Header line: HTTP/1.1 200 OK
> > Header line: Server: Microsoft-IIS/4.0
> > Header line: Date: Tue, 10 Sep 2002 18:19:09 GMT
> > Header line: Content-Type: application/msword
> > Header line: Accept-Ranges: bytes
> > Header line: Last-Modified: Tue, 10 Sep 2002 18:15:56 GMT
> > Converted Tue, 10 Sep 2002 18:15:56 GMT to Tue, 10 Sep 2002 18:15:56
> > Header line: ETag: "0d61e1bf658c21:1545e"
> > Header line: Content-Length: 32768
> > Header line:
> > returnStatus = 0
> > Read 8192 from document
> > Read 8192 from document
> > Read 8192 from document
> > Read 8192 from document
> > Read a total of 32768 bytes
> >  size = 32768
> >
> > ################################################################
> > ################################################################
> >
> > -----Original Message-----
> > From: David Adams [mailto:[EMAIL PROTECTED]]
> > Sent: Tuesday, September 10, 2002 5:16 AM
> > To: Wendt, Trevor; [EMAIL PROTECTED]
> > Cc: 'Gilles Detillieux'
> > Subject: Re: [htdig] htdig & wp2html problems
> >
> > "Read 8192 from document Read 8192 from document Read 8192 from
> > > document Read 8192 from document Read 8192 from document Read 2048
from
> > > document Read a total of 43008 bytes"
> >
> > is part of the diagnostic output from htdig itself.  If this appearing
in
> > the "excerpt" shown by htsearch then you must now have set up htdig and
> > doc2html.pl in a monumentally weird way beyond my comprehension.
> >
> > As for the doc2html.pl file, etc. which you emailed earlier I havn't yet
> > found any error except that you are using wp2html to convert .RTF files.
> I
> > may be wrong, but I did not think it had that capability.
> >
> > Have you succeeded in running doc2html.pl from the command line?   The
> > format is:
> >
> > /export/home/htdig-3.1.6/scripts/doc2html/doc2html.pl
> > /fullpathname/worddocument.doc "application/msword"
> > http://www.wherever/worddocument.doc
> >
> > where only the third argument is optional, and the second argument must
be
> > exactly "application/msword".
> >
> > --
> > David Adams
> > Computing Services
> > Southampton University
> >
> >
>



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to