Hi, guys: I'm using Spark1.6.2. There are two tables and the small one is a partitioned parquet table; The total size of the small table is 1000M but each partition only 1M; When I set spark.sql.autoBroadcastJoinThreshold to 50m and join the two tables with single partition, I get the SortMergeJoin physical plan. I have made some try and it has something to do with the partition pruning: 1. check the physical plan, and all of the partitions of the small table are added in. It seems like https://issues.apache.org/jira/browse/SPARK-16980
2. set spark.sql.hive.convertMetastoreParquet=false The pruning is success, but still get SortMergeJoin because the code HiveMetastoreCatalog.scala @transient override lazy val statistics: Statistics = Statistics( sizeInBytes = { val totalSize = hiveQlTable.getParameters.get(StatsSetupConst.TOTAL_SIZE) val rawDataSize = hiveQlTable.getParameters.get(StatsSetupConst.RAW_DATA_SIZE) The total size of the table not the single partition. How can I fix this without patches? Or Is there a patch for SPARK1.6 about SPARK-16980. best regards! Jerry