Thanks Daniel, I like your answer for #1. It makes sense.
However, I don't get why you say that there are always pending transformations... After you call an action, you should be "clean" from pending transformations, no? > On Aug 3, 2017, at 5:53 AM, Daniel Darabos <daniel.dara...@lynxanalytics.com> > wrote: > > > On Wed, Aug 2, 2017 at 2:16 PM, Jean Georges Perrin <j...@jgp.net > <mailto:j...@jgp.net>> wrote: > Hi Sparkians, > > I understand the lazy evaluation mechanism with transformations and actions. > My question is simpler: 1) are show() and/or printSchema() actions? I would > assume so... > > show() is an action (it prints data) but printSchema() is not an action. > Spark can tell you the schema of the result without computing the result. > > and optional question: 2) is there a way to know if there are transformations > "pending"? > > There are always transformations pending :). An RDD or DataFrame is a series > of pending transformations. If you say val df = spark.read.csv("foo.csv"), > that is a pending transformation. Even spark.emptyDataFrame is best > understood as a pending transformation: it does not do anything on the > cluster, but records locally what it will have to do on the cluster.