Hello,

I would like to propose the addition of a dispose() method, in addition to the 
already existing close(), in the RichFunction interface. This will align the 
lifecycle 
of a RichFunction, with that of an Operator. After this, the code paths 
followed 
when finishing successfully and when cancelling, will be totally distinct.

Semantically, close() will be responsible for guaranteeing semantic correctness 
of the job when the job terminates successfully, while dispose() will be 
responsible 
for taking care of system clean up both when terminating gracefully and when 
cancelling, e.g. freeing resources like db connections. 

Currently, most functions use close() with the semantics of dispose() as the 
only 
thing they do is freeing up resources. A nice example where this leads to 
confusion 
is the case of the BucketingSink/RollingSink where at close(), data that is not
committed is marked as "pending" (semantic correctness of the job). In this 
case, 
and given that there is no distinction between close() and dispose(), this 
method is 
called by the AbstractUdfStreamOperator both when successfully finishing a job 
and 
when something went wrong during execution. This is essentially a compromise, 
as the close() should mark this data as "committed" when successfully 
terminating, 
but it cannot as it can also be called when an exception was thrown.

Let me know what you think,
Kostas 

Reply via email to