What about accumulators ?

> On 14. Aug 2017, at 20:15, Lukas Bradley <lukasbrad...@gmail.com> wrote:
> 
> We have had issues with gathering status on long running jobs.  We have 
> attempted to draw parallels between the Spark UI/Monitoring API and our code 
> base.  Due to the separation between code and the execution plan, even having 
> a guess as to where we are in the process is difficult.  The Job/Stage/Task 
> information is too abstracted from our code to be easily digested by non 
> Spark engineers on our team.
> 
> Is there a "hook" to which I can attach a piece of code that is triggered 
> when a point in the plan is reached?  This could be when a SQL command 
> completes, or when a new DataSet is created, anything really...  
> 
> It seems Dataset.checkpoint() offers an excellent snapshot position during 
> execution, but I'm concerned I'm short-circuiting the optimal execution of 
> the full plan.  I really want these trigger functions to be completely 
> independent of the actual processing itself.  I'm not looking to extract 
> information from a Dataset, RDD, or anything else.  I essentially want to 
> write independent output for status.  
> 
> If this doesn't exist, is there any desire on the dev team for me to 
> investigate this feature?
> 
> Thank you for any and all help.

Reply via email to