If you are talking about a tree, then the RDDs are nodes, and the dependencies are the edges.
If you are talking about a DAG, then the partitions in the RDDs are the nodes, and the dependencies between the partitions are the edges. On Thu, Apr 16, 2020 at 4:02 PM, Mania Abdi < abdi...@husky.neu.edu > wrote: > > Is it correct to say, the nodes in the DAG are RDDs and the edges are > computations? > > > On Thu, Apr 16, 2020 at 6:21 PM Reynold Xin < rxin@ databricks. com ( > r...@databricks.com ) > wrote: > > >> The RDD is the DAG. >> >> >> >> On Thu, Apr 16, 2020 at 3:16 PM, Mania Abdi < abdi. ma@ husky. neu. edu ( >> abdi...@husky.neu.edu ) > wrote: >> >>> Hello everyone, >>> >>> I am implementing a caching mechanism for analytic workloads running on >>> top of Spark and I need to retrieve the Spark DAG right after it is >>> generated and the DAG scheduler. I would appreciate it if you could give >>> me some hints or reference me to some documents about where the DAG is >>> generated and inputs assigned to it. I found the DAG Scheduler class ( >>> https://github.com/apache/spark/blob/55dea9be62019d64d5d76619e1551956c8bb64d0/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala >>> ) but I am not sure if it is a good starting point. >>> >>> >>> >>> Regards >>> Mania >>> >> >> > >
smime.p7s
Description: S/MIME Cryptographic Signature