Did you state previously that you were using a newish version of Red Hat? Red Hat has moved to defaulting things to a UTF-8 environment. I believe one consequence is that Perl treats text mode filehandles as UTF-8; this has the potential to introduce problems if the input includes anything other than 7-bit ASCII (i.e. characters coded with values greater than 127).

You might try putting the filehandle in binary mode and see if that clears up the problem. In order to do this, simply add the line

binmode CAT;

immediately preceding the first use of the handle. That should be somewhere in the neighborhood of line 110 I think (the code should be added before 'while (<CAT>)').

Jim

On Monday, June 30, 2003, at 12:43 PM, Dan Muey wrote:

When indexing pdf's, I accasionally get:

Malformed UTF-8 character (unexpected continuation byte 0xad, with no preceding start byte) in substitution ...

Always blamed on line 113 and 117 in pdf2html.pl

Which is in pdf_body()

1930:1758:5:http://www.marcinciso.com/cassens/master/ctc-1448.pdf: size = 9071

I assume the above line means all went well.
But then I get this for lots of these pdf's:

1931:1901:5:http://www.marcinciso.com/cassens/master/ctc-1481.pdf: !! Malformed UTF-8 character (unexpected continuation byte 0xad, with no preceding start byte) in substitution (s///) at /home/dmuey/doc2html/pdf2html.pl line 117, <CAT> line 9.
!! Malformed UTF-8 character (unexpected continuation byte 0xad, with no preceding start byte) in substitution (s///) at /home/dmuey/doc2html/pdf2html.pl line 117, <CAT> line 10.
!! Malformed UTF-8 character (unexpected continuation byte 0xad, with no preceding start byte) in substitution (s///) at /home/dmuey/doc2html/pdf2html.pl line 117, <CAT> line 12.
size = 5891


And back to normal:
1932:1599:5:http://www.marcinciso.com/cassens/master/ctc-1014.pdf: size = 8400


So is this just a warning or does it mean it's not able to index it very well/at all?
What could cause it and how could I remedy it?


Thanks

Dan


-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/ 01
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html



------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01 _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to