Got it. That makes sense and provides all the variations of functionality so it sounds like a win!
On Thu, May 26, 2016 at 1:21 PM, Andy LoPresto <[email protected]> wrote: > Hi Joe, > > I think what you proposed is more than is necessary. replaceAll is > functionally complete already, as if it is passed a literal expression, it > will compile it as such and replace all instances in the subject string. I > think the confusing point is that the difference between replace and > replaceAll is not the number of times the replacement is attempted, but > rather the literal vs. regular expression compilation. As Joe Percivall > suggested, I think three methods (replace(literal), > replaceFirst(regex/literal), and replaceAll(regex/literal)) are sufficient. > > Having the replaceRegex and replaceLiteral functions with an additional > parameter to control occurrence count would be difficult, as we would > either need to expose some kind of enum or perform string matching to > interpret user-entered flag values. > > I don’t think replace and replaceAll need to be EOL-ed, I think the > documentation just needs to call out the behavior very explicitly, and I > believe having a new option replaceFirst will also help to suggest the > differences between the methods even if users do not read the entire > documentation. > > > Andy LoPresto > [email protected] > *[email protected] <[email protected]>* > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > > On May 26, 2016, at 10:09 AM, Joe Skora <[email protected]> wrote: > > I think there is transitional path that is non-breaking and could prevent > 0.x to 1.x migration issues, even if at the expense of a little extra code > for a period of time. > > 1. Leave the existing "replace" and "replaceAll" functions as is for > now. > 2. Implement the new "replaceRegex" and "replaceLiteral" functions. I'd > prefer giving them a parameter selecting first, last, or all occurrences > over function variants, it could even be optional with all as the > default. > 3. In a future release deprecate the "replace" and "replaceAll" > functions. > 4. In a future, future release remove the "replace" and "replaceAll" > > functions. > > From past experience managing enterprise systems, users preferred a > transition period like this instead of requiring everything be updated at > once. We could then provide them a means to find any outstanding use of > the old functionality and clean it up before it was retired. And from the > system development and management perspective, that saved us a lot of pain > too since we didn't have an influx of minor but troublesome issues in the > midst transitions that could demand our attention be on other bigger > problems. > > It would be possible to have a signle function that takes another parameter > indicating whether the target is a literal or regular expression, but I > like the separate functions, especially if they will be implemented with > different underlying APIs. > > On Wed, May 25, 2016 at 10:43 PM, Andy LoPresto <[email protected]> > wrote: > > Joe, > > These would be breaking changes and a lot of existing workflows would > begin to behave differently. I would suggest making an incremental change > here — simply adding replaceFirst as a non-destructive change as a solution > for this issue, and opening a new Jira for the changes which break backward > compatibility. > > I do agree that option 1 is probably the cleaner way forward, and if we > introduce new method names, we may be able to use StandardFlowSynchronizer > to detect legacy methods from a pre-1.0 flow and update them automatically > during flow migration. > > Andy LoPresto > [email protected] > *[email protected] <[email protected]>* > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > > On May 25, 2016, at 4:06 PM, Joe Percivall <[email protected] > <[email protected]> > <[email protected]>> wrote: > > Andy, > > Nice write-up and thanks for bringing attention to this. I definitely > assumed for a while that replace vs replaceAll was the number of things > replaced. The underlying problem, I think, is that these EL methods are > just wrappers around the Java String methods and the Java String methods > are named in a confusing manner. > > I am on board with adding a "replaceFirst(regex, replacement)" method. > This adds a bit more functionality and is just exposing another Java String > method. > > In addition to that, I would suggest doing something to alleviate the > confusion between "replace" and "replaceAll". In a similar fashion to > adding decimal support, I see two avenues we could take: > > 1. Change the names of the functions to "replaceLiteral" and > "replaceRegex" (or "replaceAllLiteral" and "replaceAllRegex") > 2. Remove one of the methods and add a third field to the remaining method > to indicate whether to replace literally or interpret as a regex > > Both of these would be breaking changes and introduced with 1.0 and I am > leaning towards option 1 with the base name "replace". I believe when the > "replaceFirst" method is added, "replaceLiteral" and "replaceRegex" would > be easy to understand that they replace all occurrences. > > Joe > > - - - - - - > Joseph Percivall > linkedin.com/in/Percivall > e: [email protected] > > > > On Wednesday, May 25, 2016 6:24 PM, Andy LoPresto <[email protected]> > wrote: > > > > Hi all, > > During investigation of an expression language issue posted to the list, I > discovered that replace explicitly delegates to a String#replace invocation > that only accepts literal expressions, not regular expressions, while > replaceAll accepts regular expressions. I thought this was an oversight and > filed NIFI-1919 [1] to document and fix this, by changing the > ReplaceEvaluator [2] to use String#replaceFirst, which accepts regular > expressions. I wrote failing unit tests [3] to capture the fix. After > implementing the change, two existing unit tests [4] broke, which indicated > a regression. At first, I believed these two tests to be incorrect, but > further investigation showed they were merely surprising. > > TestQuery#testQuotingQuotes (below) fails on the second verifyEquals call, > but the test is asserting that replace should replace multiple instances of > the single quote. While this is similar to String#replace, because the > expression language exposes only two methods — replace vs. replaceAll — one > could easily assume the difference between the two was the number of > attempted replacements, rather than the actual difference, which is literal > expression vs. pattern. > > @Test > public void testQuotingQuotes() { > final Map<String, String> attributes = new HashMap<>(); > attributes.put("xx", "say 'hi'"); > > String query = "${xx:replaceAll( \"'.*'\", '\\\"hello\\\"' )}"; > verifyEquals(query, attributes, "say \"hello\""); > > query = "${xx:replace( \"'\", '\"')}"; > verifyEquals(query, attributes, "say \"hi\""); > > query = "${xx:replace( '\\'', '\"')}"; > System.out.println(query); > verifyEquals(query, attributes, "say \"hi\""); > } > TestQuery#testReplaceAllWithOddNumberOfBackslashPairs (below) also fails > on the first verifyEquals call with a PatternSyntaxException. I am > investigating that further. > > @Test > public void testReplaceAllWithOddNumberOfBackslashPairs() { > final Map<String, String> attributes = new HashMap<>(); > attributes.put("filename", "C:\\temp\\.txt"); > > verifyEquals("${filename:replace('\\\\', '/')}", attributes, > "C:/temp/.txt"); > verifyEquals("${filename:replaceAll('\\\\\\\\', '/')}", attributes, > "C:/temp/.txt"); > verifyEquals("${filename:replaceAll('\\\\\\.txt$', '')}", attributes, > "C:\\temp"); > } > While I originally had just modified replace, after looking at the EL > documentation [5], replace is explicitly documented to only replace literal > expressions, and does so multiple times, as does Java’s String#replace [6]. > I now propose to add another method replaceFirst, which accepts a pattern > and replaces only the first occurrence. I will update the unit tests to > properly capture this, and will update the documentation to reflect the new > method. > > Thoughts from the community? > > [1] https://issues.apache.org/jira/browse/NIFI-1919 > [2] > > https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/evaluation/functions/ReplaceEvaluator.java > [3] > > https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/groovy/org/apache/nifi/attribute/expression/language/QueryGroovyTest.groovy > [4] > > https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/java/org/apache/nifi/attribute/expression/language/TestQuery.java > [5] > > https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#replace > [6] > > https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(java.lang.CharSequence,%20java.lang.CharSequence) > > > > Andy LoPresto > [email protected] > [email protected] > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > > >
