java-user@lucene.apache.org Hi,
I'm trying to use code from lucene-core for following use-case in my project. Given a big sorted list of string words (call it dictionary) and a wildcard/regex pattern, return the list of index of words from dictionary that matched the wildcard pattern. Here is one implementation for wildcard queries.... public List<Integer> match(String wildcardPattern, List<String> dictionary) { WildcardQuery query = new WildcardQuery(new Term("dummy", wildcardPattern)); Automaton automaton = query.getAutomaton(); CharacterRunAutomaton runner = new CharacterRunAutomaton(automaton); List<Integer> result = new ArrayList<>(); for (int i = 0; i < dictionary.size(); i++) { if (runner.run(dictionary.get(i))) { result.add(i); } } return result; } Above implementation works but does not exploit the sorted nature of dictionary and I guess there are ways to do that from using some other code from lucene-core. My guess is based on the javadoc on WildcardQuery (and similar comment in RegexQuery doc) "Note this query can be slow, as it needs to iterate over many terms. In order to prevent extremely slow WildcardQueries, a Wildcard term should not start with the wildcard *" For example, if I knew all the prefixes from wildcard pattern, then I can prune dictionary by focusing my search on the words that have those prefixes (such pruning can be done possibly via binary search). Can someone give me pointers or show me in the lucene code where similar thing is done? Thanks.