Given the goal of this improvement is to speed up, do you think below is a 
realistic test? Do you think it applies across other JVMs?

```java
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import opennlp.tools.sentdetect.lang.Factory;

class Scratch {

  private static final int ITERATIONS = 100_000_000;

  private static Set<Character> eosCharacters;

  public static void main(String[] args) {
    eosCharacters = new HashSet<>();
    for (char eosChar: Factory.defaultEosCharacters) {
      eosCharacters.add(eosChar);
    }

    
    char[] cbuf = new char[20];

    System.out.println("defaultEosCharacters");
    for (char eos : Factory.defaultEosCharacters) {
      Arrays.fill(cbuf, eos);
      testBuffer(cbuf);
    }

    System.out.println("ptEosCharacters");
    for (char eos : Factory.ptEosCharacters) {
      Arrays.fill(cbuf, eos);
      testBuffer(cbuf);
    }

    System.out.println("jpnEosCharacters");
    for (char eos : Factory.jpnEosCharacters) {
      Arrays.fill(cbuf, eos);
      testBuffer(cbuf);
    }
  }

  private static void testBuffer(char[] cbuf) {
    System.out.println("Testing with: " + new String(cbuf));
    {
      long start = System.currentTimeMillis();
      for (int n = 0; n < ITERATIONS; n++) {
        getPositionsArray(cbuf);
      }
      long duration = System.currentTimeMillis() - start;
      System.out.println("Duration array (ms): " + duration);
    }

    {
      long start = System.currentTimeMillis();
      for (int n = 0; n < ITERATIONS; n++) {
        getPositionsHashset(cbuf);
      }
      long duration = System.currentTimeMillis() - start;
      System.out.println("Duration set (ms): " + duration);
    }
  }

  public static List<Integer> getPositionsArray(char[] cbuf) {
    List<Integer> l = new ArrayList<>();
    char[] eosCharacters = Factory.defaultEosCharacters;
    for (int i = 0; i < cbuf.length; i++) {
      for (char eosCharacter : eosCharacters) {
        if (cbuf[i] == eosCharacter) {
          l.add(i);
          break;
        }
      }
    }
    return l;
  }

  public static List<Integer> getPositionsHashset(char[] cbuf) {
    List<Integer> l = new ArrayList<>();
    for (int i = 0; i < cbuf.length; i++) {
      if (eosCharacters.contains(cbuf[i])) {
        l.add(i);
      }
    }
    return l;
  }
  
}
```

```bash
"C:\Program Files\Java\jdk1.8.0_162\bin\java.exe" ....
defaultEosCharacters
Testing with: ....................
Duration array (ms): 16424
Duration set (ms): 25844
Testing with: !!!!!!!!!!!!!!!!!!!!
Duration array (ms): 17498
Duration set (ms): 26696
Testing with: ????????????????????
Duration array (ms): 17948
Duration set (ms): 25391
ptEosCharacters
Testing with: ....................
Duration array (ms): 16975
Duration set (ms): 25442
Testing with: ????????????????????
Duration array (ms): 18012
Duration set (ms): 25529
Testing with: !!!!!!!!!!!!!!!!!!!!
Duration array (ms): 17562
Duration set (ms): 25579
Testing with: ;;;;;;;;;;;;;;;;;;;;
Duration array (ms): 4040
Duration set (ms): 6223
Testing with: ::::::::::::::::::::
Duration array (ms): 3991
Duration set (ms): 6276
Testing with: ((((((((((((((((((((
Duration array (ms): 3980
Duration set (ms): 6185
Testing with: ))))))))))))))))))))
Duration array (ms): 4043
Duration set (ms): 6199
Testing with: ««««««««««««««««««««
Duration array (ms): 3971
Duration set (ms): 8503
Testing with: »»»»»»»»»»»»»»»»»»»»
Duration array (ms): 3960
Duration set (ms): 8587
Testing with: ''''''''''''''''''''
Duration array (ms): 3920
Duration set (ms): 5450
Testing with: """"""""""""""""""""
Duration array (ms): 3931
Duration set (ms): 5396
jpnEosCharacters
Testing with: 。。。。。。。。。。。。。。。。。。。。
Duration array (ms): 3974
Duration set (ms): 8616
Testing with: !!!!!!!!!!!!!!!!!!!!
Duration array (ms): 3908
Duration set (ms): 9276
Testing with: ????????????????????
Duration array (ms): 3953
Duration set (ms): 9278

Process finished with exit code 0

```

[ Full content available at: https://github.com/apache/opennlp/pull/329 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to