To initialize the StandardAnalyzer to use no stop words, the following code seems to work. I question, however, whether it is the best way to do it. Is there a better way?

One wonders whether having a string array with one element whose value is set to the empty string causes the analyzer to repeatedly check the array and examine its one element. Besides the risk of a performance loss, the approach in the code below seems like a kludge.

C#
------------------------
// about to initialize StandardAnalyzer to use no stopwords
// apparently Lucene uses a hash table with keys internally for stoplist
// need to prevent Key from being Null
//
// set object reference to string array of stopwords to an instance
string[] saStopWords = new string[0];
// keep Key from being Null
saStopWords(0) = string.Empty;
// instantiate StandardAnalyzer to use no stopwords
StandardAnalyzer lucAnalyzer = new StandardAnalyzer(saStopWords);

Visual Basic
------------------------
' about to initialize StandardAnalyzer to use no stopwords
' apparently Lucene uses a hash table with keys internally for stoplist
' need to prevent Key from being Null
'
' set object reference to string array of stopwords to an instance
Dim saStopWords(0) As String
' keep Key from being Null
saStopWords(0) = String.Empty
' instantiate StandardAnalyzer to use no stopwords
Dim lucAnalyzer As New StandardAnalyzer(saStopWords)

I figure that using the first overload with no parameter specifying stopwords instantiates the analyzer to use the default stoplist for that analyzer. If that's so, then that overload is out. It won't let me analyze using no stopwords.

The overload whose single parameter is a string array works, but only if the string array reference is set to an instance, hence:

    string[] saStopWords = new string[0];

    or

    Dim saStopWords() as String

and only if Key is not Null, hence:

    saStopWords(0) = string.Empty;

    or

    saStopWords(0) = String.Empty

So there ya go, with the analyzer probably repeatedly examining the array and its single element.

I wonder whether the overload that takes a hashtable as a parameter can be used in some way that prevents repeated, useless examination of the table.

Any ideas?

Thanks for any help.

T. R.
[email protected]
http://www.linkedin.com/in/trhalvorson
www.ncodian.com
http://twitter.com/trhalvorson

Reply via email to