Hello Winston, Thanks again for this response, I will check this one out.
On Wed, Aug 2, 2023 at 3:50 PM Winston Lai <weiruanl...@gmail.com> wrote: > > Hi Vibhatha, > > I helped you post this question to another community. There is one answer > by someone else for your reference. > > To access the logical plan or optimized plan, you can register a custom > QueryExecutionListener and retrieve the plans during the query execution > process. Here's an example of how to do it in Scala: > > > import org.apache.spark.sql.{SparkSession, QueryExecutionListener} > > > > // Create a custom QueryExecutionListener > > class CustomQueryExecutionListener extends QueryExecutionListener { > > override def onSuccess(funcName: String, qe: > org.apache.spark.sql.execution.QueryExecution, durationNs: Long): Unit = { > > // Retrieve the logical plan > > val logicalPlan = qe.logical > > > > // Retrieve the optimized plan > > val optimizedPlan = qe.optimizedPlan > > > > // Process the plans with your custom function > > processPlans(logicalPlan, optimizedPlan) > > } > > > > override def onFailure(funcName: String, qe: > org.apache.spark.sql.execution.QueryExecution, exception: Exception): Unit > = {} > > } > > > > // Create a SparkSession > > val spark = SparkSession.builder() > > .appName("Example") > > .getOrCreate() > > > > // Register the custom QueryExecutionListener > > spark.listenerManager.register(new CustomQueryExecutionListener) > > > > // Perform your DataFrame operations > > val df = spark.read.csv("path/to/file.csv") > > val filteredDF = df.filter(df("column") > 10) > > val resultDF = filteredDF.select("column1", "column2") > > > > // Trigger the execution of the DF to invoke the listener > > resultDF.show() > > Thank You & Best Regards > Winston Lai > ------------------------------ > *From:* Vibhatha Abeykoon <vibha...@gmail.com> > *Sent:* Wednesday, August 2, 2023 5:03:15 PM > *To:* Ruifeng Zheng <zrfli...@gmail.com> > *Cc:* Winston Lai <weiruanl...@gmail.com>; user@spark.apache.org < > user@spark.apache.org> > *Subject:* Re: Extracting Logical Plan > > I understand. I sort of drew the same conclusion. But I wasn’t sure. > Thanks everyone for taking time on this. > > On Wed, Aug 2, 2023 at 2:29 PM Ruifeng Zheng <zrfli...@gmail.com> wrote: > > In Spark Connect, I think the only API to show optimized plan is > `df.explain("extended")` as Winston mentioned, but it is not a LogicalPlan > object. > > On Wed, Aug 2, 2023 at 4:36 PM Vibhatha Abeykoon <vibha...@gmail.com> > wrote: > > Hello Ruifeng, > > Thank you for these pointers. Would it be different if I use the Spark > connect? I am not using the regular SparkSession. I am pretty new to these > APIs. Appreciate your thoughts. > > On Wed, Aug 2, 2023 at 2:00 PM Ruifeng Zheng <zrfli...@gmail.com> wrote: > > Hi Vibhatha, > I think those APIs are still avaiable? > > > > ``` > Welcome to > ____ __ > / __/__ ___ _____/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 3.4.1 > /_/ > > Using Scala version 2.12.17 (OpenJDK 64-Bit Server VM, Java 11.0.19) > Type in expressions to have them evaluated. > Type :help for more information. > > scala> val df = spark.range(0, 10) > df: org.apache.spark.sql.Dataset[Long] = [id: bigint] > > scala> df.queryExecution > res0: org.apache.spark.sql.execution.QueryExecution = > == Parsed Logical Plan == > Range (0, 10, step=1, splits=Some(12)) > > == Analyzed Logical Plan == > id: bigint > Range (0, 10, step=1, splits=Some(12)) > > == Optimized Logical Plan == > Range (0, 10, step=1, splits=Some(12)) > > == Physical Plan == > *(1) Range (0, 10, step=1, splits=12) > > scala> df.queryExecution.optimizedPlan > res1: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan = > Range (0, 10, step=1, splits=Some(12)) > ``` > > > > On Wed, Aug 2, 2023 at 3:58 PM Vibhatha Abeykoon <vibha...@gmail.com> > wrote: > > Hi Winston, > > I need to use the LogicalPlan object and process it with another function > I have written. In earlier Spark versions we can access that via the > dataframe object. So if it can be accessed via the UI, is there an API to > access the object? > > On Wed, Aug 2, 2023 at 1:24 PM Winston Lai <weiruanl...@gmail.com> wrote: > > Hi Vibhatha, > > How about reading the logical plan from Spark UI, do you have access to > the Spark UI? I am not sure what infra you run your Spark jobs on. Usually > you should be able to view the logical and physical plan under Spark UI in > text version at least. It is independent from the language (e.g., > scala/Python/R) that you use to run Spark. > > > On Wednesday, August 2, 2023, Vibhatha Abeykoon <vibha...@gmail.com> > wrote: > > Hi Winston, > > I am looking for a way to access the LogicalPlan object in Scala. Not sure > if explain function would serve the purpose. > > On Wed, Aug 2, 2023 at 9:14 AM Winston Lai <weiruanl...@gmail.com> wrote: > > Hi Vibhatha, > > Have you tried pyspark.sql.DataFrame.explain — PySpark 3.4.1 > documentation (apache.org) > <https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.explain.html#pyspark.sql.DataFrame.explain> > before? > I am not sure what infra that you have, you can try this first. If it > doesn't work, you may share more info such as what platform you are running > your Spark jobs on, what cloud servies you are using ... > > On Wednesday, August 2, 2023, Vibhatha Abeykoon <vibha...@gmail.com> > wrote: > > Hello, > > I recently upgraded the Spark version to 3.4.1 and I have encountered a > few issues. In my previous code, I was able to extract the logical plan > using `df.queryExecution` (df: DataFrame and in Scala), but it seems like > in the latest API it is not supported. Is there a way to extract the > logical plan or optimized plan from a dataframe or dataset in Spark 3.4.1? > > Best, > Vibhatha > > -- > Vibhatha Abeykoon > > -- > Vibhatha Abeykoon > > > > -- > Ruifeng Zheng > E-mail: zrfli...@gmail.com > > -- > Vibhatha Abeykoon > > > > -- > Ruifeng Zheng > E-mail: zrfli...@gmail.com > > -- > Vibhatha Abeykoon >