Re: StopWords problem

Grant Ingersoll Wed, 26 Dec 2007 14:11:12 -0800

Are you altering (stemming) the token before it gets to the StopFilter?


On Dec 26, 2007, at 5:08 PM, Liaqat Ali wrote:

Doron Cohen wrote:
On Dec 26, 2007 10:33 PM, Liaqat Ali <[EMAIL PROTECTED]> wrote:
Using javac -encoding UTF-8 still raises the following error.

urduIndexer.java : illegal character: \65279
?
^
1 error

What I am doing wrong?
If you have the stop-words in a file, say one word in a line,
they can be read like this:

   BufferedReader r = new BufferedReader(new InputStreamReader(new
FileInputStream("Urdu.txt"),"UTF8"));
String word = r.readLine(); // loop this line, you get thepicture
(Make sure to specify encoding "UTF8" when saving the file fromnotepad).
Regards,
Doron
Hi, Doron
The compilation problem is solved, but there is no change in theindex.
public static final String[] URDU_STOP_WORDS ={ "کی" ,"کا" ,"کو" ,"ہے" ,"کے" ,"نے" ,"پر" ,"اور" ,"سے","میں" ,"بھی","ان" ,"ایک" ,"تھا" ,"تھی" ,"کیا" ,"ہیں" ,"کر" ,"وہ" ,"جس" ,"نہں" ,"تک" };
Analyzer analyzer = new StandardAnalyzer(URDU_STOP_WORDS);


Again these words are appeared in the index with high ranks.

Regards,
Liaqat

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: StopWords problem

Reply via email to