Re: StopWords problem

Liaqat Ali Wed, 26 Dec 2007 14:09:13 -0800

Doron Cohen wrote:

On Dec 26, 2007 10:33 PM, Liaqat Ali <[EMAIL PROTECTED]> wrote:

Using javac -encoding UTF-8 still raises the following error.

urduIndexer.java : illegal character: \65279
?
^
1 error

What I am doing wrong?


If you have the stop-words in a file, say one word in a line,
they can be read like this:

    BufferedReader r = new BufferedReader(new InputStreamReader(new
FileInputStream("Urdu.txt"),"UTF8"));
    String word = r.readLine();    // loop this line, you get the picture

(Make sure to specify encoding "UTF8" when saving the file from notepad).

Regards,
Doron

Hi, Doron

The compilation problem is solved, but there is no change in the index.

public static final String[] URDU_STOP_WORDS = { "کی" ,"کا" ,"کو" ,"ہے","کے" ,"نے" ,"پر" ,"اور" ,"سے","میں" ,"بھی"

,"ان" ,"ایک" ,"تھا" ,"تھی" ,"کیا" ,"ہیں" ,"کر" ,"وہ" ,"جس" ,"نہں" ,"تک" };
Analyzer analyzer = new StandardAnalyzer(URDU_STOP_WORDS);


Again these words are appeared in the index with high ranks.

Regards,
Liaqat

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: StopWords problem

Reply via email to