Jarrett Billingsley Wrote: > On Tue, Jan 6, 2009 at 9:20 PM, james <jame...@gmail.com> wrote: > > Jarrett Billingsley Wrote: > > > >> On Tue, Jan 6, 2009 at 8:04 PM, james <jame...@gmail.com> wrote: > >> > im writing an indexer, but im having a problem because on some file, > >> > when i read gives this error > >> > > >> > Error 4: invalid UTF-8 sequence > >> > > >> > is there a way to fix it. > >> > > >> > >> You're probably reading a file that's encoded in some non-Unicode > >> encoding, like Latin-1. You could read in the file data as byte[] > >> instead of as char[], but that still doesn't deal with the problem > >> that you have characters in your file that are outside the ASCII > >> range. If you know what encoding your file uses, you could do some > >> transformations on it to turn it into valid Unicode, or you could just > >> ignore characters outside the ASCII range :P > > > > is there any library or function that can automatically convert these > > unknown html charset into UTF-8 > > Not that I know of, for D anyway.
i just found out about a function 'UnicodeFile' in tango, but im using D1.0 and phobos, maybe i should write one of my own.