Dear wizards, please advise.
I need to offer a user configuration feature for pattern matching, to
exclude objects from my billion object sort-merge (which is now working
fairly well, thank you all).
What we're mostly trying to do is exclude any record which contains any
one of a number of substrings. The computer science textbooks give
various fast-string-searching algorithms with pre-computed tables, any
of which would suit our use case, but I don't see a practical
implementation of any of them floating around...
Current practical options:
* java.util.regex, precompiled patterns
- Reputedly slow at matching, but our patterns are simple.
- We are using a single regex containing a large alternation.
- perf says this regex matcher is 50% of our runtime.
- WHY isn't Matcher.usePattern() allocation-free? It totally could
be. This means that the int[] array allocation is a major drain on the GC.
- If we use ThreadLocal<Matcher> instead, the ThreadLocal can't be
static, and hits the previously discussed issue with blowing out the
ThreadLocalMap.
* rej2
- Not tried yet - has anyone tried this?
* brics
- Trying and failing on brics may make sense before falling back to
java regex.
* Groovy Closure
- Faster than regex/pattern assuming you have a better strategy for
matching. But now you're down to repeated contains() calls.
What are the suggestions?
Thank you.
S.
--
You received this message because you are subscribed to the Google Groups
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.