First of all, I think you know that `QueryExecution` is a developer API
right? By definition `QueryExecution.logical` is the input plan, which can
even be unresolved. Developers should be aware of it and do not apply
operations that need the plan to be resolved. Obviously `LogicalPlan.stats`
needs the plan to be resolved.

For this particular case, we can make it work by defining `computeStats` in
`AnalysisBarrier`. But it's also OK to just leave it as it is, as this
doesn't break any real use cases.

On Thu, Jan 4, 2018 at 4:36 PM, Jacek Laskowski <ja...@japila.pl> wrote:

> Hi,
>
> I use Spark from the master today.
>
> $ ./bin/spark-shell --version
> Welcome to
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
>       /_/
>
> Using Scala version 2.11.8, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_152
> Branch master
> Compiled by user jacek on 2018-01-04T05:44:05Z
> Revision 7d045c5f00e2c7c67011830e2169a4e130c3ace8
>
> Can anyone explain why some queries have stats in logical plan while
> others don't (and I had to use analyzed logical plan)?
>
> I can explain the difference using the code, but I don't know why there is
> the difference.
>
> spark.range(1000).write.parquet("/tmp/p1000")
> // The stats are available in logical plan (in logical "phase")
> scala> spark.read.parquet("/tmp/p1000").queryExecution.logical.stats
> res21: org.apache.spark.sql.catalyst.plans.logical.Statistics =
> Statistics(sizeInBytes=6.9 KB, hints=none)
>
> // logical plan fails, but it worked fine above --> WHY?!
> val names = Seq((1, "one"), (2, "two")).toDF("id", "name")
> scala> names.queryExecution.logical.stats
> java.lang.UnsupportedOperationException
>   at org.apache.spark.sql.catalyst.plans.logical.LeafNode.
> computeStats(LogicalPlan.scala:232)
>   at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.
> SizeInBytesOnlyStatsPlanVisitor$.default(SizeInBytesOnlyStatsPlanVisito
> r.scala:55)
>   at org.apache.spark.sql.catalyst.plans.logical.statsEstimation.
> SizeInBytesOnlyStatsPlanVisitor$.default(SizeInBytesOnlyStatsPlanVisito
> r.scala:27)
>
> // analyzed logical plan works fine
> scala> names.queryExecution.analyzed.stats
> res23: org.apache.spark.sql.catalyst.plans.logical.Statistics =
> Statistics(sizeInBytes=48.0 B, hints=none)
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://about.me/JacekLaskowski
> Mastering Spark SQL https://bit.ly/mastering-spark-sql
> Spark Structured Streaming https://bit.ly/spark-structured-streaming
> Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
> Follow me at https://twitter.com/jaceklaskowski
>

Reply via email to