[
https://issues.apache.org/jira/browse/SPARK-43163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-43163.
----------------------------------
Resolution: Invalid
> An exception occurred while hive table join tidb table
> ------------------------------------------------------
>
> Key: SPARK-43163
> URL: https://issues.apache.org/jira/browse/SPARK-43163
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.2.3
> Reporter: kroraina
> Priority: Major
>
> When executing a query of a hive partition table (big one) inner join a tidb
> table(small one), the hive partition table is auto broadcasted, which leads
> an error.
> The query is somelike
> {{select hive_table.col1,tidb_table.col2 from hive_table inner join
> tidb_table on hive_table.col2=tidb_table.col3 where ...}}
> == Physical Plan ==
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- Project [... 109 more fields]
> +- Generate HiveGenericUDTF#udf.json.JsonExtractValueUDTF(xxx), [, ... 101
> more fields], false, [...]
> +- Project [, ... 102 more fields]
> +- BroadcastHashJoin [xxx#94], [xxxx#475], Inner, BuildRight, false
> :- TiKV CoprocessorRDD\{[table: xxx] TableReader, Columns: xxxx(): {
> TableRangeScan: { RangeFilter: [], Range:
> [([t\200\000\000\000\000\000\004\253_r\200\000\000\000\000\000\000\000],
> [t\200\000\000\000\000\000\004\253_s\000\000\000\000\000\000\000\000])([t\200\000\000\000\000\000\004\253_r\000\000\000\000\000\000\000\000],
> [t\200\000\000\000\000\000\004\253_r\200\000\000\000\000\000\000\000])] } },
> startTs: 440854942292115639} EstimatedCount:20837
> +- BroadcastExchange HashedRelationBroadcastMode(List(input[107, string,
> false]),false), [plan_id=32]
> +- Filter isnotnull(xxx#475)
> +- Scan hive xx.xxxxxxxx [, ... 100 more fields], HiveTableRelation
> [{{{}xx{}}}.{{{}xxx{}}},
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, Data Cols: [...,
> Partition Cols: [[#520|https://github.com/pingcap/tispark/issues/520],
> [#521|https://github.com/pingcap/tispark/issues/521],
> [#522|https://github.com/pingcap/tispark/issues/522],
> [#523|https://github.com/pingcap/tispark/pull/523]], Pruned Partitions: [(, ,
> , )]], [isnotnull(), (), (xx = xx)]
> Here I got some log info maybe helpful.
> The {{plan.stats.sizeInBytes}} of the LogicalPlan of the hive table is too
> small and the {{plan.stats.sizeInBytes}} of LogicalPlan of the tidb table is
> too big.
> The stats of the LogicalPlans of the two seems reversed.
> *Spark and TiSpark version info*
> Spark 3.2.3
> TiSpark 3.1.2(with a profile of spark-3.2)
> *Additional context*
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]