On Tue, Jan 6, 2009 at 9:20 PM, james <jame...@gmail.com> wrote: > Jarrett Billingsley Wrote: > >> On Tue, Jan 6, 2009 at 8:04 PM, james <jame...@gmail.com> wrote: >> > im writing an indexer, but im having a problem because on some file, when >> > i read gives this error >> > >> > Error 4: invalid UTF-8 sequence >> > >> > is there a way to fix it. >> > >> >> You're probably reading a file that's encoded in some non-Unicode >> encoding, like Latin-1. You could read in the file data as byte[] >> instead of as char[], but that still doesn't deal with the problem >> that you have characters in your file that are outside the ASCII >> range. If you know what encoding your file uses, you could do some >> transformations on it to turn it into valid Unicode, or you could just >> ignore characters outside the ASCII range :P > > is there any library or function that can automatically convert these unknown > html charset into UTF-8
Not that I know of, for D anyway.