On Tue, Jan 6, 2009 at 8:04 PM, james <jame...@gmail.com> wrote: > im writing an indexer, but im having a problem because on some file, when i > read gives this error > > Error 4: invalid UTF-8 sequence > > is there a way to fix it. >
You're probably reading a file that's encoded in some non-Unicode encoding, like Latin-1. You could read in the file data as byte[] instead of as char[], but that still doesn't deal with the problem that you have characters in your file that are outside the ASCII range. If you know what encoding your file uses, you could do some transformations on it to turn it into valid Unicode, or you could just ignore characters outside the ASCII range :P