Given the goal of this improvement is to speed up, do you think below is a
realistic test? Do you think it applies across other JVMs?
```java
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import opennlp.tools.sentdetect.lang.Factory;
class Scratch {
private static final int ITERATIONS = 100_000_000;
private static Set<Character> eosCharacters;
public static void main(String[] args) {
eosCharacters = new HashSet<>();
for (char eosChar: Factory.defaultEosCharacters) {
eosCharacters.add(eosChar);
}
char[] cbuf = new char[20];
System.out.println("defaultEosCharacters");
for (char eos : Factory.defaultEosCharacters) {
Arrays.fill(cbuf, eos);
testBuffer(cbuf);
}
System.out.println("ptEosCharacters");
for (char eos : Factory.ptEosCharacters) {
Arrays.fill(cbuf, eos);
testBuffer(cbuf);
}
System.out.println("jpnEosCharacters");
for (char eos : Factory.jpnEosCharacters) {
Arrays.fill(cbuf, eos);
testBuffer(cbuf);
}
}
private static void testBuffer(char[] cbuf) {
System.out.println("Testing with: " + new String(cbuf));
{
long start = System.currentTimeMillis();
for (int n = 0; n < ITERATIONS; n++) {
getPositionsArray(cbuf);
}
long duration = System.currentTimeMillis() - start;
System.out.println("Duration array (ms): " + duration);
}
{
long start = System.currentTimeMillis();
for (int n = 0; n < ITERATIONS; n++) {
getPositionsHashset(cbuf);
}
long duration = System.currentTimeMillis() - start;
System.out.println("Duration set (ms): " + duration);
}
}
public static List<Integer> getPositionsArray(char[] cbuf) {
List<Integer> l = new ArrayList<>();
char[] eosCharacters = Factory.defaultEosCharacters;
for (int i = 0; i < cbuf.length; i++) {
for (char eosCharacter : eosCharacters) {
if (cbuf[i] == eosCharacter) {
l.add(i);
break;
}
}
}
return l;
}
public static List<Integer> getPositionsHashset(char[] cbuf) {
List<Integer> l = new ArrayList<>();
for (int i = 0; i < cbuf.length; i++) {
if (eosCharacters.contains(cbuf[i])) {
l.add(i);
}
}
return l;
}
}
```
```bash
"C:\Program Files\Java\jdk1.8.0_162\bin\java.exe" ....
defaultEosCharacters
Testing with: ....................
Duration array (ms): 16424
Duration set (ms): 25844
Testing with: !!!!!!!!!!!!!!!!!!!!
Duration array (ms): 17498
Duration set (ms): 26696
Testing with: ????????????????????
Duration array (ms): 17948
Duration set (ms): 25391
ptEosCharacters
Testing with: ....................
Duration array (ms): 16975
Duration set (ms): 25442
Testing with: ????????????????????
Duration array (ms): 18012
Duration set (ms): 25529
Testing with: !!!!!!!!!!!!!!!!!!!!
Duration array (ms): 17562
Duration set (ms): 25579
Testing with: ;;;;;;;;;;;;;;;;;;;;
Duration array (ms): 4040
Duration set (ms): 6223
Testing with: ::::::::::::::::::::
Duration array (ms): 3991
Duration set (ms): 6276
Testing with: ((((((((((((((((((((
Duration array (ms): 3980
Duration set (ms): 6185
Testing with: ))))))))))))))))))))
Duration array (ms): 4043
Duration set (ms): 6199
Testing with: ««««««««««««««««««««
Duration array (ms): 3971
Duration set (ms): 8503
Testing with: »»»»»»»»»»»»»»»»»»»»
Duration array (ms): 3960
Duration set (ms): 8587
Testing with: ''''''''''''''''''''
Duration array (ms): 3920
Duration set (ms): 5450
Testing with: """"""""""""""""""""
Duration array (ms): 3931
Duration set (ms): 5396
jpnEosCharacters
Testing with: 。。。。。。。。。。。。。。。。。。。。
Duration array (ms): 3974
Duration set (ms): 8616
Testing with: !!!!!!!!!!!!!!!!!!!!
Duration array (ms): 3908
Duration set (ms): 9276
Testing with: ????????????????????
Duration array (ms): 3953
Duration set (ms): 9278
Process finished with exit code 0
```
[ Full content available at: https://github.com/apache/opennlp/pull/329 ]
This message was relayed via gitbox.apache.org for [email protected]