After some discussion and further thought, leaving “replace” to handle literal Strings which contain escape characters may be convenient. In this case I would suggest either renaming “replace” to “replaceAllLiteral” or the other methods to “replaceFirstRegex” and “replaceAllRegex” to illustrate the differences.
Andy LoPresto [email protected] [email protected] PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > On May 27, 2016, at 10:25 AM, Andy LoPresto <[email protected]> wrote: > > Following up on this, I am not sure replace is needed at all. ReplaceAll, if > called with a “regular expression” that is just a literal String, will > perform the same operation as replace would. > > In NIFI-1919, I will not remove replace, as that would break BC, but I submit > that for 1.0, we should remove it, as it’s behavior is completely covered by > another method. We would then have: > > replaceFirst - accepts a literal or regex > replaceAll - accepts a literal or regex > > Again, we could perform “flow upgrade” through the StandardFlowSynchronizer > to migrate all uses of “replace” to “replaceAll". Thoughts? > > Andy LoPresto > [email protected] <mailto:[email protected]> > [email protected] <mailto:[email protected]> > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > >> On May 26, 2016, at 10:28 AM, Joe Skora <[email protected] >> <mailto:[email protected]>> wrote: >> >> Got it. That makes sense and provides all the variations of functionality >> so it sounds like a win! >> >> On Thu, May 26, 2016 at 1:21 PM, Andy LoPresto <[email protected] >> <mailto:[email protected]>> wrote: >> >>> Hi Joe, >>> >>> I think what you proposed is more than is necessary. replaceAll is >>> functionally complete already, as if it is passed a literal expression, it >>> will compile it as such and replace all instances in the subject string. I >>> think the confusing point is that the difference between replace and >>> replaceAll is not the number of times the replacement is attempted, but >>> rather the literal vs. regular expression compilation. As Joe Percivall >>> suggested, I think three methods (replace(literal), >>> replaceFirst(regex/literal), and replaceAll(regex/literal)) are sufficient. >>> >>> Having the replaceRegex and replaceLiteral functions with an additional >>> parameter to control occurrence count would be difficult, as we would >>> either need to expose some kind of enum or perform string matching to >>> interpret user-entered flag values. >>> >>> I don’t think replace and replaceAll need to be EOL-ed, I think the >>> documentation just needs to call out the behavior very explicitly, and I >>> believe having a new option replaceFirst will also help to suggest the >>> differences between the methods even if users do not read the entire >>> documentation. >>> >>> >>> Andy LoPresto >>> [email protected] <mailto:[email protected]> >>> *[email protected] <mailto:[email protected]> >>> <[email protected] <mailto:[email protected]>>* >>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 >>> >>> On May 26, 2016, at 10:09 AM, Joe Skora <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> I think there is transitional path that is non-breaking and could prevent >>> 0.x to 1.x migration issues, even if at the expense of a little extra code >>> for a period of time. >>> >>> 1. Leave the existing "replace" and "replaceAll" functions as is for >>> now. >>> 2. Implement the new "replaceRegex" and "replaceLiteral" functions. I'd >>> prefer giving them a parameter selecting first, last, or all occurrences >>> over function variants, it could even be optional with all as the >>> default. >>> 3. In a future release deprecate the "replace" and "replaceAll" >>> functions. >>> 4. In a future, future release remove the "replace" and "replaceAll" >>> >>> functions. >>> >>> From past experience managing enterprise systems, users preferred a >>> transition period like this instead of requiring everything be updated at >>> once. We could then provide them a means to find any outstanding use of >>> the old functionality and clean it up before it was retired. And from the >>> system development and management perspective, that saved us a lot of pain >>> too since we didn't have an influx of minor but troublesome issues in the >>> midst transitions that could demand our attention be on other bigger >>> problems. >>> >>> It would be possible to have a signle function that takes another parameter >>> indicating whether the target is a literal or regular expression, but I >>> like the separate functions, especially if they will be implemented with >>> different underlying APIs. >>> >>> On Wed, May 25, 2016 at 10:43 PM, Andy LoPresto <[email protected] >>> <mailto:[email protected]>> >>> wrote: >>> >>> Joe, >>> >>> These would be breaking changes and a lot of existing workflows would >>> begin to behave differently. I would suggest making an incremental change >>> here — simply adding replaceFirst as a non-destructive change as a solution >>> for this issue, and opening a new Jira for the changes which break backward >>> compatibility. >>> >>> I do agree that option 1 is probably the cleaner way forward, and if we >>> introduce new method names, we may be able to use StandardFlowSynchronizer >>> to detect legacy methods from a pre-1.0 flow and update them automatically >>> during flow migration. >>> >>> Andy LoPresto >>> [email protected] <mailto:[email protected]> >>> *[email protected] <mailto:[email protected]> >>> <[email protected] <mailto:[email protected]>>* >>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 >>> >>> On May 25, 2016, at 4:06 PM, Joe Percivall <[email protected] >>> <mailto:[email protected]> >>> <[email protected] <mailto:[email protected]>> >>> <[email protected] <mailto:[email protected]>>> >>> wrote: >>> >>> Andy, >>> >>> Nice write-up and thanks for bringing attention to this. I definitely >>> assumed for a while that replace vs replaceAll was the number of things >>> replaced. The underlying problem, I think, is that these EL methods are >>> just wrappers around the Java String methods and the Java String methods >>> are named in a confusing manner. >>> >>> I am on board with adding a "replaceFirst(regex, replacement)" method. >>> This adds a bit more functionality and is just exposing another Java String >>> method. >>> >>> In addition to that, I would suggest doing something to alleviate the >>> confusion between "replace" and "replaceAll". In a similar fashion to >>> adding decimal support, I see two avenues we could take: >>> >>> 1. Change the names of the functions to "replaceLiteral" and >>> "replaceRegex" (or "replaceAllLiteral" and "replaceAllRegex") >>> 2. Remove one of the methods and add a third field to the remaining method >>> to indicate whether to replace literally or interpret as a regex >>> >>> Both of these would be breaking changes and introduced with 1.0 and I am >>> leaning towards option 1 with the base name "replace". I believe when the >>> "replaceFirst" method is added, "replaceLiteral" and "replaceRegex" would >>> be easy to understand that they replace all occurrences. >>> >>> Joe >>> >>> - - - - - - >>> Joseph Percivall >>> linkedin.com/in/Percivall <http://linkedin.com/in/Percivall> >>> e: [email protected] <mailto:[email protected]> >>> >>> >>> >>> On Wednesday, May 25, 2016 6:24 PM, Andy LoPresto <[email protected] >>> <mailto:[email protected]>> >>> wrote: >>> >>> >>> >>> Hi all, >>> >>> During investigation of an expression language issue posted to the list, I >>> discovered that replace explicitly delegates to a String#replace invocation >>> that only accepts literal expressions, not regular expressions, while >>> replaceAll accepts regular expressions. I thought this was an oversight and >>> filed NIFI-1919 [1] to document and fix this, by changing the >>> ReplaceEvaluator [2] to use String#replaceFirst, which accepts regular >>> expressions. I wrote failing unit tests [3] to capture the fix. After >>> implementing the change, two existing unit tests [4] broke, which indicated >>> a regression. At first, I believed these two tests to be incorrect, but >>> further investigation showed they were merely surprising. >>> >>> TestQuery#testQuotingQuotes (below) fails on the second verifyEquals call, >>> but the test is asserting that replace should replace multiple instances of >>> the single quote. While this is similar to String#replace, because the >>> expression language exposes only two methods — replace vs. replaceAll — one >>> could easily assume the difference between the two was the number of >>> attempted replacements, rather than the actual difference, which is literal >>> expression vs. pattern. >>> >>> @Test >>> public void testQuotingQuotes() { >>> final Map<String, String> attributes = new HashMap<>(); >>> attributes.put("xx", "say 'hi'"); >>> >>> String query = "${xx:replaceAll( \"'.*'\", '\\\"hello\\\"' )}"; >>> verifyEquals(query, attributes, "say \"hello\""); >>> >>> query = "${xx:replace( \"'\", '\"')}"; >>> verifyEquals(query, attributes, "say \"hi\""); >>> >>> query = "${xx:replace( '\\'' <smb://''>, '\"')}"; >>> System.out.println(query); >>> verifyEquals(query, attributes, "say \"hi\""); >>> } >>> TestQuery#testReplaceAllWithOddNumberOfBackslashPairs (below) also fails >>> on the first verifyEquals call with a PatternSyntaxException. I am >>> investigating that further. >>> >>> @Test >>> public void testReplaceAllWithOddNumberOfBackslashPairs() { >>> final Map<String, String> attributes = new HashMap<>(); >>> attributes.put("filename", "C:\\temp\\.txt"); >>> >>> verifyEquals("${filename:replace('\\\\', '/')}", attributes, >>> "C:/temp/.txt"); >>> verifyEquals("${filename:replaceAll('\\\\\\\\', '/')}", attributes, >>> "C:/temp/.txt"); >>> verifyEquals("${filename:replaceAll('\\\\\\.txt$', '')}", attributes, >>> "C:\\temp"); >>> } >>> While I originally had just modified replace, after looking at the EL >>> documentation [5], replace is explicitly documented to only replace literal >>> expressions, and does so multiple times, as does Java’s String#replace [6]. >>> I now propose to add another method replaceFirst, which accepts a pattern >>> and replaces only the first occurrence. I will update the unit tests to >>> properly capture this, and will update the documentation to reflect the new >>> method. >>> >>> Thoughts from the community? >>> >>> [1] https://issues.apache.org/jira/browse/NIFI-1919 >>> <https://issues.apache.org/jira/browse/NIFI-1919> >>> [2] >>> >>> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/evaluation/functions/ReplaceEvaluator.java >>> >>> <https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/evaluation/functions/ReplaceEvaluator.java> >>> [3] >>> >>> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/groovy/org/apache/nifi/attribute/expression/language/QueryGroovyTest.groovy >>> >>> <https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/groovy/org/apache/nifi/attribute/expression/language/QueryGroovyTest.groovy> >>> [4] >>> >>> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/java/org/apache/nifi/attribute/expression/language/TestQuery.java >>> >>> <https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/java/org/apache/nifi/attribute/expression/language/TestQuery.java> >>> [5] >>> >>> https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#replace >>> >>> <https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#replace> >>> [6] >>> >>> https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(java.lang.CharSequence,%20java.lang.CharSequence) >>> >>> <https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(java.lang.CharSequence,%20java.lang.CharSequence)> >>> >>> >>> >>> Andy LoPresto >>> [email protected] <mailto:[email protected]> >>> [email protected] <mailto:[email protected]> >>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 >
signature.asc
Description: Message signed with OpenPGP using GPGMail
