When indexing pdf's, I accasionally get: Malformed UTF-8 character (unexpected continuation byte 0xad, with no preceding start byte) in substitution ...
Always blamed on line 113 and 117 in pdf2html.pl Which is in pdf_body() 1930:1758:5:http://www.marcinciso.com/cassens/master/ctc-1448.pdf: size = 9071 I assume the above line means all went well. But then I get this for lots of these pdf's: 1931:1901:5:http://www.marcinciso.com/cassens/master/ctc-1481.pdf: !! Malformed UTF-8 character (unexpected continuation byte 0xad, with no preceding start byte) in substitution (s///) at /home/dmuey/doc2html/pdf2html.pl line 117, <CAT> line 9. !! Malformed UTF-8 character (unexpected continuation byte 0xad, with no preceding start byte) in substitution (s///) at /home/dmuey/doc2html/pdf2html.pl line 117, <CAT> line 10. !! Malformed UTF-8 character (unexpected continuation byte 0xad, with no preceding start byte) in substitution (s///) at /home/dmuey/doc2html/pdf2html.pl line 117, <CAT> line 12. size = 5891 And back to normal: 1932:1599:5:http://www.marcinciso.com/cassens/master/ctc-1014.pdf: size = 8400 So is this just a warning or does it mean it's not able to index it very well/at all? What could cause it and how could I remedy it? Thanks Dan ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01 _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

