markap14 commented on pull request #4800: URL: https://github.com/apache/nifi/pull/4800#issuecomment-772750238
I think even at a process group level, it is too dangerous. I don't think we should ever expire data on any connection without a user explicitly configuring that connection to do so. There is a small indicator on the connection, but it is easily missed, especially if you're not looking for it. And many users, especially newer ones, may not know what that icon means. If they notice it and go searching to find what it means, it will be easy to figure out, but it won't necessarily jump out at them. I would also consider it an anti-pattern to go through a flow and start marking all connections with expiration dates. Typically you have a flow that has many processors, and expiration would be configured on the end of that dataflow only. There's no real need to age the data off in the middle. If age off is configured only at the end, the data will age off as necessary, and even if there's backpressure applied the aging off will allow data to continue through the flow and get to the end where it will be aged off. Certainly, configuring backpressure throughout the flow could result in being more efficient if you have a lot of data by aging off the old data more aggressively. But at what cost? If ageoff is set to 5 mins and you decide you now want it to be 10 minutes, you now have to go through potentially hundreds of connections and update them. This would get unwieldy very quickly. I very much appreciate the work that you've put into this. It's a non-trivial PR, for sure. But given the likelihood of unintended data loss and the concerns about usability that it would introduce I'm a -1 on this feature. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
