Babulal created SPARK-25332:
-------------------------------
Summary: Instead of broadcast hash join ,Sort merge join has
selected when restart spark-shell/spark-JDBC for hive provider
Key: SPARK-25332
URL: https://issues.apache.org/jira/browse/SPARK-25332
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.3.0
Reporter: Babulal
spark.sql("create table x1(name string,age int) stored as parquet ")
spark.sql("insert into x1 select 'a',29")
spark.sql("create table x2 (name string,age int) stored as parquet '")
spark.sql("insert into x2_ex select 'a',29")
scala> spark.sql("select * from x1 t1 ,x2 t2 where t1.name=t2.name").explain
== Physical Plan ==
*{color:#14892c}(2) BroadcastHashJoin{color} [name#101], [name#103], Inner,
BuildRight
:- *(2) Project [name#101, age#102]
: +- *(2) Filter isnotnull(name#101)
: +- *(2) FileScan parquet default.x1_ex[name#101,age#102] Batched: true,
Format: Parquet, Location:
InMemoryFileIndex[file:/D:/spark_release/spark/bin/spark-warehouse/x1,
PartitionFilters: [], PushedFilters: [IsNotNull(name)], ReadSchema:
struct<name:string,age:int>
+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true]))
+- *(1) Project [name#103, age#104]
+- *(1) Filter isnotnull(name#103)
+- *(1) FileScan parquet default.x2_ex[name#103,age#104] Batched: true,
Format: Parquet, Location:
InMemoryFileIndex[file:/D:/spark_release/spark/bin/spark-warehouse/x2,
PartitionFilters: [], PushedFilters: [IsNotNull(name)], ReadSchema:
struct<name:string,age:int>
Now Restart Spark-Shell or do spark-submit orrestart JDBCServer again and run
same select query again
scala> spark.sql("select * from x1 t1 ,x2 t2 where t1.name=t2.name").explain
scala> spark.sql("select * from x1 t1 ,x2 t2 where t1.name=t2.name").explain
== Physical Plan ==
*{color:#FF0000}(5) SortMergeJoin [{color}name#43], [name#45], Inner
:- *(2) Sort [name#43 ASC NULLS FIRST], false, 0
: +- Exchange hashpartitioning(name#43, 200)
: +- *(1) Project [name#43, age#44]
: +- *(1) Filter isnotnull(name#43)
: +- *(1) FileScan parquet default.x1[name#43,age#44] Batched: true, Format:
Parquet, Location:
InMemoryFileIndex[file:/D:/spark_release/spark/bin/spark-warehouse/x1],
PartitionFilters: [], PushedFilters: [IsNotNull(name)], ReadSchema:
struct<name:string,age:int>
+- *(4) Sort [name#45 ASC NULLS FIRST], false, 0
+- Exchange hashpartitioning(name#45, 200)
+- *(3) Project [name#45, age#46]
+- *(3) Filter isnotnull(name#45)
+- *(3) FileScan parquet default.x2[name#45,age#46] Batched: true, Format:
Parquet, Location:
InMemoryFileIndex[file:/D:/spark_release/spark/bin/spark-warehouse/x2],
PartitionFilters: [], PushedFilters: [IsNotNull(name)], ReadSchema:
struct<name:string,age:int>
scala> spark.sql("desc formatted x1").show(200,false)
+----------------------------+--------------------------------------------------------------+-------+
|col_name |data_type |comment|
+----------------------------+--------------------------------------------------------------+-------+
|name |string |null |
|age |int |null |
| | | |
|# Detailed Table Information| | |
|Database |default | |
|Table |x1 | |
|Owner |Administrator | |
|Created Time |Sun Aug 19 12:36:58 IST 2018 | |
|Last Access |Thu Jan 01 05:30:00 IST 1970 | |
|Created By |Spark 2.3.0 | |
|Type |MANAGED | |
|Provider |hive | |
|Table Properties |[transient_lastDdlTime=1534662418] | |
|Location |file:/D:/spark_release/spark/bin/spark-warehouse/x1 | |
|Serde Library |org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | |
|InputFormat |org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat | |
|OutputFormat |org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat| |
|Storage Properties |[serialization.format=1] | |
|Partition Provider |Catalog | |
+----------------------------+--------------------------------------------------------------+-------+
With datasource table ,working fine ( create table using parquet instead of
stored by )
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]