On Dec 26, 2007 10:33 PM, Liaqat Ali <[EMAIL PROTECTED]> wrote: > Using javac -encoding UTF-8 still raises the following error. > > urduIndexer.java : illegal character: \65279 > ? > ^ > 1 error > > What I am doing wrong? >
If you have the stop-words in a file, say one word in a line, they can be read like this: BufferedReader r = new BufferedReader(new InputStreamReader(new FileInputStream("Urdu.txt"),"UTF8")); String word = r.readLine(); // loop this line, you get the picture (Make sure to specify encoding "UTF8" when saving the file from notepad). Regards, Doron