Hello, I would like to propose the addition of a dispose() method, in addition to the already existing close(), in the RichFunction interface. This will align the lifecycle of a RichFunction, with that of an Operator. After this, the code paths followed when finishing successfully and when cancelling, will be totally distinct.
Semantically, close() will be responsible for guaranteeing semantic correctness of the job when the job terminates successfully, while dispose() will be responsible for taking care of system clean up both when terminating gracefully and when cancelling, e.g. freeing resources like db connections. Currently, most functions use close() with the semantics of dispose() as the only thing they do is freeing up resources. A nice example where this leads to confusion is the case of the BucketingSink/RollingSink where at close(), data that is not committed is marked as "pending" (semantic correctness of the job). In this case, and given that there is no distinction between close() and dispose(), this method is called by the AbstractUdfStreamOperator both when successfully finishing a job and when something went wrong during execution. This is essentially a compromise, as the close() should mark this data as "committed" when successfully terminating, but it cannot as it can also be called when an exception was thrown. Let me know what you think, Kostas