> In my version of doc2html.pl the magic number '^\320\317\021\340' is the
> test for a Microsoft file (Word, Excel or Powerpoint).
> 
> The test for a PDF file should use '%PDF-|\0PDF CARO\001\000\377'.

> > I've downloaded doc2html.pl and have been experimenting with it to process
> pdf
> > files.  I've found that pdf2html.pl works but doc2html.pl which should be
> > calling pdf2html.pl doesn't work and isn't calling pdf2html.pl (I've
> edited
> > both files to fill in local pathnames).  I think that it's comparing the
> magic
> > number '^\320\317\021\340' given in the store_methods sub-routine for pdf
> files
> > with the beginning of the file which it reads in the read_magic
> sub-routine and
> > finding they don't match.  Before I pursue this further, is this a known
> > problem?  TIA.


I got the wrong number in my message.  Sorry.  I might have picked up the wrong
part of the file with the mouse.  The magic number for a pdf file given in
doc2html.pl is what you've written.

I've run an octal dump on the pdf file and found that it begins with


%   P   D   F   -   1   .   4  \r   % 342 343 317 323 

If the "|" in the magic number it's looking for is the alternation symbol, then
this should match.  Is there some reason it wouldn't?  TIA.

Douglas

========
Douglas Kline
[EMAIL PROTECTED]




-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
ht://Dig general mailing list: <[EMAIL PROTECTED]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to