What about accumulators ?
> On 14. Aug 2017, at 20:15, Lukas Bradley <lukasbrad...@gmail.com> wrote: > > We have had issues with gathering status on long running jobs. We have > attempted to draw parallels between the Spark UI/Monitoring API and our code > base. Due to the separation between code and the execution plan, even having > a guess as to where we are in the process is difficult. The Job/Stage/Task > information is too abstracted from our code to be easily digested by non > Spark engineers on our team. > > Is there a "hook" to which I can attach a piece of code that is triggered > when a point in the plan is reached? This could be when a SQL command > completes, or when a new DataSet is created, anything really... > > It seems Dataset.checkpoint() offers an excellent snapshot position during > execution, but I'm concerned I'm short-circuiting the optimal execution of > the full plan. I really want these trigger functions to be completely > independent of the actual processing itself. I'm not looking to extract > information from a Dataset, RDD, or anything else. I essentially want to > write independent output for status. > > If this doesn't exist, is there any desire on the dev team for me to > investigate this feature? > > Thank you for any and all help.