Andy,

Nice write-up and thanks for bringing attention to this. I definitely assumed 
for a while that replace vs replaceAll was the number of things replaced. The 
underlying problem, I think, is that these EL methods are just wrappers around 
the Java String methods and the Java String methods are named in a confusing 
manner. 
 
I am on board with adding a "replaceFirst(regex, replacement)" method. This 
adds a bit more functionality and is just exposing another Java String method.

In addition to that, I would suggest doing something to alleviate the confusion 
between "replace" and "replaceAll". In a similar fashion to adding decimal 
support, I see two avenues we could take:

1. Change the names of the functions to "replaceLiteral" and "replaceRegex" (or 
"replaceAllLiteral" and "replaceAllRegex")
2. Remove one of the methods and add a third field to the remaining method to 
indicate whether to replace literally or interpret as a regex

Both of these would be breaking changes and introduced with 1.0 and I am 
leaning towards option 1 with the base name "replace". I believe when the 
"replaceFirst" method is added, "replaceLiteral" and "replaceRegex" would be 
easy to understand that they replace all occurrences.

Joe

- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: [email protected]



On Wednesday, May 25, 2016 6:24 PM, Andy LoPresto <[email protected]> wrote:



Hi all,

During investigation of an expression language issue posted to the list, I 
discovered that replace explicitly delegates to a String#replace invocation 
that only accepts literal expressions, not regular expressions, while 
replaceAll accepts regular expressions. I thought this was an oversight and 
filed NIFI-1919 [1] to document and fix this, by changing the ReplaceEvaluator 
[2] to use String#replaceFirst, which accepts regular expressions. I wrote 
failing unit tests [3] to capture the fix. After implementing the change, two 
existing unit tests [4] broke, which indicated a regression. At first, I 
believed these two tests to be incorrect, but further investigation showed they 
were merely surprising. 

TestQuery#testQuotingQuotes (below) fails on the second verifyEquals call, but 
the test is asserting that replace should replace multiple instances of the 
single quote. While this is similar to String#replace, because the expression 
language exposes only two methods — replace vs. replaceAll — one could easily 
assume the difference between the two was the number of attempted replacements, 
rather than the actual difference, which is literal expression vs. pattern. 

@Test
public void testQuotingQuotes() {
final Map<String, String> attributes = new HashMap<>();
attributes.put("xx", "say 'hi'");

String query = "${xx:replaceAll( \"'.*'\", '\\\"hello\\\"' )}";
verifyEquals(query, attributes, "say \"hello\"");

query = "${xx:replace( \"'\", '\"')}";
verifyEquals(query, attributes, "say \"hi\"");

query = "${xx:replace( '\\'', '\"')}";
System.out.println(query);
verifyEquals(query, attributes, "say \"hi\"");
}
TestQuery#testReplaceAllWithOddNumberOfBackslashPairs (below) also fails on the 
first verifyEquals call with a PatternSyntaxException. I am investigating that 
further.

@Test
public void testReplaceAllWithOddNumberOfBackslashPairs() {
final Map<String, String> attributes = new HashMap<>();
attributes.put("filename", "C:\\temp\\.txt");

verifyEquals("${filename:replace('\\\\', '/')}", attributes, "C:/temp/.txt");
verifyEquals("${filename:replaceAll('\\\\\\\\', '/')}", attributes, 
"C:/temp/.txt");
verifyEquals("${filename:replaceAll('\\\\\\.txt$', '')}", attributes, 
"C:\\temp");
}
While I originally had just modified replace, after looking at the EL 
documentation [5], replace is explicitly documented to only replace literal 
expressions, and does so multiple times, as does Java’s String#replace [6]. I 
now propose to add another method replaceFirst, which accepts a pattern and 
replaces only the first occurrence. I will update the unit tests to properly 
capture this, and will update the documentation to reflect the new method. 

Thoughts from the community?

[1] https://issues.apache.org/jira/browse/NIFI-1919
[2] 
https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/evaluation/functions/ReplaceEvaluator.java
[3] 
https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/groovy/org/apache/nifi/attribute/expression/language/QueryGroovyTest.groovy
[4] 
https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/java/org/apache/nifi/attribute/expression/language/TestQuery.java
[5] 
https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#replace
[6] 
https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(java.lang.CharSequence,%20java.lang.CharSequence)



Andy LoPresto
[email protected]
[email protected]
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

Reply via email to