Doron Cohen wrote:
On Dec 26, 2007 10:33 PM, Liaqat Ali <[EMAIL PROTECTED]> wrote:
Using javac -encoding UTF-8 still raises the following error.
urduIndexer.java : illegal character: \65279
?
^
1 error
What I am doing wrong?
If you have the stop-words in a file, say one word in a line,
they can be read like this:
BufferedReader r = new BufferedReader(new InputStreamReader(new
FileInputStream("Urdu.txt"),"UTF8"));
String word = r.readLine(); // loop this line, you get the picture
(Make sure to specify encoding "UTF8" when saving the file from notepad).
Regards,
Doron
Hi, Doron
The compilation problem is solved, but there is no change in the index.
public static final String[] URDU_STOP_WORDS = { "کی" ,"کا" ,"کو" ,"ہے"
,"کے" ,"نے" ,"پر" ,"اور" ,"سے","میں" ,"بھی"
,"ان" ,"ایک" ,"تھا" ,"تھی" ,"کیا" ,"ہیں" ,"کر" ,"وہ" ,"جس" ,"نہں" ,"تک" };
Analyzer analyzer = new StandardAnalyzer(URDU_STOP_WORDS);
Again these words are appeared in the index with high ranks.
Regards,
Liaqat
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]