[
https://issues.apache.org/jira/browse/NIFI-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16292702#comment-16292702
]
ASF GitHub Bot commented on NIFI-2169:
--------------------------------------
Github user markap14 commented on a diff in the pull request:
https://github.com/apache/nifi/pull/2343#discussion_r157232403
--- Diff:
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/RouteText.java
---
@@ -209,6 +215,30 @@
private volatile Map<Relationship, PropertyValue> propertyMap = new
HashMap<>();
private volatile Pattern groupingRegex = null;
+ @VisibleForTesting
+ final static int PATTERNS_CACHE_MAXIMUM_ENTRIES = 10;
+
+ /**
+ * LRU cache for the compiled patterns. The size of the cache is
determined by the value of
+ * {@link #PATTERNS_CACHE_MAXIMUM_ENTRIES}.
+ */
+ @VisibleForTesting
+ final ConcurrentMap<Pair<Boolean, String>, Pattern> patternsCache =
CacheBuilder.newBuilder()
+ .maximumSize(PATTERNS_CACHE_MAXIMUM_ENTRIES)
+ .<Pair<Boolean, String>, Pattern>build()
+ .asMap();
+
+ private final Function<Pair<Boolean, String>, Pattern> compileRegex =
ignoreCaseAndRegex -> {
--- End diff --
Again, I would avoid the use of the Pair<Boolean, String> here... and
really would probably avoid the Function all together. Since it seems to be
referenced only once, I'd prefer to instead just inline the use in the
cacheCompiledPattern method, so that there we could just call something like:
`return patternsCache.computeIfAbsent(key, toCompile -> ignoreCase ?
Pattern.compile(toCompile, Pattern.CASE_INSENSITIVE) :
Pattern.compile(toCompile));`
> Improve RouteText performance with pre-compilation of RegEx in certain cases
> ----------------------------------------------------------------------------
>
> Key: NIFI-2169
> URL: https://issues.apache.org/jira/browse/NIFI-2169
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Core Framework
> Affects Versions: 0.6.1
> Reporter: Stephane Maarek
> Assignee: Oleg Zhurakousky
> Labels: beginner, easy
>
> When using RegEx matches for the RouteText processor (and possibly other
> processors), the RegEx gets recompiled every time the processor works. The
> RegEx could be precompiled / cached under certain conditions, in order to
> improve the performance of the processor
> See email from Mark Payne:
> Re #2: The regular expression is compiled every time. This is done, though,
> because the Regex allows the Expression
> Language to be used, so the Regex could actually be different for each
> FlowFile. That being said, it could certainly be
> improved by either (a) pre-compiling in the case that no Expression Language
> is used and/or (b) cache up to say 10
> Regex'es once they are compiled.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)