When indexing pdf's, I accasionally get:

Malformed UTF-8 character (unexpected continuation byte 0xad, with no preceding start 
byte) in substitution ...

Always blamed on line 113 and 117 in pdf2html.pl

Which is in pdf_body()

1930:1758:5:http://www.marcinciso.com/cassens/master/ctc-1448.pdf:  size = 9071

I assume the above line means all went well.
But then I get this for lots of these pdf's:

1931:1901:5:http://www.marcinciso.com/cassens/master/ctc-1481.pdf: !!   Malformed 
UTF-8 character (unexpected continuation byte 0xad, with no preceding start byte) in 
substitution (s///) at /home/dmuey/doc2html/pdf2html.pl line 117, <CAT> line 9.
!!      Malformed UTF-8 character (unexpected continuation byte 0xad, with no 
preceding start byte) in substitution (s///) at /home/dmuey/doc2html/pdf2html.pl line 
117, <CAT> line 10.
!!      Malformed UTF-8 character (unexpected continuation byte 0xad, with no 
preceding start byte) in substitution (s///) at /home/dmuey/doc2html/pdf2html.pl line 
117, <CAT> line 12.
 size = 5891

And back to normal:
1932:1599:5:http://www.marcinciso.com/cassens/master/ctc-1014.pdf:  size = 8400

So is this just a warning or does it mean it's not able to index it very well/at all?
What could cause it and how could I remedy it?

Thanks

Dan


-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to