Hi,

Recently I had to optimize few Apache Spark SQL queries. Some of the 
Datasets were reused, so they were cached. However after caching I don't 
see SQL Visualization for the cached Dataset in Spark UI - I see only 
InMemoryRelation node. Explain result at the bottom of the page still 
has full plan.

Is this an expected behaviour? In such cases we have much less options 
to debug performance in Spark. My suggestion is to show full diagram on 
the first action after cache or to show separate SQL query for cache - 
second option however probably is not possible as cache does not trigger 
calculation, so we can't get metrics.

Workaround is to temporairly disable caching, but it consumes much time 
to do it, especially on large datasets

Pozdrawiam / Best regards,

Tomek

Reply via email to