Jarrett Billingsley Wrote:

> On Tue, Jan 6, 2009 at 8:04 PM, james <jame...@gmail.com> wrote:
> > im writing an indexer, but im having a problem because on some file, when i 
> > read gives this error
> >
> > Error 4: invalid UTF-8 sequence
> >
> > is there a way to fix it.
> >
> 
> You're probably reading a file that's encoded in some non-Unicode
> encoding, like Latin-1.  You could read in the file data as byte[]
> instead of as char[], but that still doesn't deal with the problem
> that you have characters in your file that are outside the ASCII
> range.  If you know what encoding your file uses, you could do some
> transformations on it to turn it into valid Unicode, or you could just
> ignore characters outside the ASCII range :P

is there any library or function that can automatically convert these unknown 
html charset into UTF-8

Reply via email to