Interesting!
I'm not sure the above test is realistic because some conditions look extreme
to me such as ITERATIONS = 100_000_000, testing against repetition of one of
eos chars (e.g. Testing with: ....................), using only
Factory.defaultEosCharacters which has three eos chars only.
If I change the code as follows:
```
private static final int ITERATIONS = 10000;
public static void main(String[] args) {
// use Factory.ptEosCharacters rather than Factory.defaultEosCharacters
eosCharacters = new HashSet<>();
for (char eosChar: Factory.ptEosCharacters) {
eosCharacters.add(eosChar);
}
// use normal sentences rather than ....................
char[] cbuf = new String("I think you are better off sending an email to
the solr-user mailing " +
"list (http://lucene.apache.org/solr/community.html#mailing-lists-irc)
and explaining " +
"more about your use case so we can understand what leads up to the
dump. Most likely you " +
"will find ways to reconfigure your cluster or queries in a way that
avoids this situation. " +
"Or perhaps your cluster is simply under-dimensioned.").toCharArray();
testBuffer(cbuf);
}
public static List<Integer> getPositionsArray(char[] cbuf) {
List<Integer> l = new ArrayList<>();
// use Factory.ptEosCharacters rather than Factory.defaultEosCharacters
char[] eosCharacters = Factory.ptEosCharacters;
for (int i = 0; i < cbuf.length; i++) {
for (char eosCharacter : eosCharacters) {
if (cbuf[i] == eosCharacter) {
l.add(i);
break;
}
}
}
return l;
}
```
I got the following result which shows opposite:
```
Duration array (ms): 197
Duration set (ms): 73
```
But I think your feedback is very interesting and highly appreciated. Thank
you. :)
[ Full content available at: https://github.com/apache/opennlp/pull/329 ]
This message was relayed via gitbox.apache.org for [email protected]