Dear wizards, please advise.

I need to offer a user configuration feature for pattern matching, to exclude objects from my billion object sort-merge (which is now working fairly well, thank you all).

What we're mostly trying to do is exclude any record which contains any one of a number of substrings. The computer science textbooks give various fast-string-searching algorithms with pre-computed tables, any of which would suit our use case, but I don't see a practical implementation of any of them floating around...

Current practical options:
* java.util.regex, precompiled patterns
  - Reputedly slow at matching, but our patterns are simple.
  - We are using a single regex containing a large alternation.
  - perf says this regex matcher is 50% of our runtime.
- WHY isn't Matcher.usePattern() allocation-free? It totally could be. This means that the int[] array allocation is a major drain on the GC. - If we use ThreadLocal<Matcher> instead, the ThreadLocal can't be static, and hits the previously discussed issue with blowing out the ThreadLocalMap.
* rej2
  - Not tried yet - has anyone tried this?
* brics
- Trying and failing on brics may make sense before falling back to java regex.
* Groovy Closure
- Faster than regex/pattern assuming you have a better strategy for matching. But now you're down to repeated contains() calls.

What are the suggestions?

Thank you.

S.

--
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to