[ 
https://issues.apache.org/jira/browse/SPARK-43163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-43163.
----------------------------------
    Resolution: Invalid

> An exception occurred while hive table join tidb table
> ------------------------------------------------------
>
>                 Key: SPARK-43163
>                 URL: https://issues.apache.org/jira/browse/SPARK-43163
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.3
>            Reporter: kroraina
>            Priority: Major
>
> When executing a query of a hive partition table (big one) inner join a tidb 
> table(small one), the hive partition table is auto broadcasted, which leads 
> an error.
> The query is somelike
>  {{select hive_table.col1,tidb_table.col2 from hive_table inner join 
> tidb_table on hive_table.col2=tidb_table.col3 where ...}}
> == Physical Plan ==
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- Project [... 109 more fields]
> +- Generate HiveGenericUDTF#udf.json.JsonExtractValueUDTF(xxx), [, ... 101 
> more fields], false, [...]
> +- Project [, ... 102 more fields]
> +- BroadcastHashJoin [xxx#94], [xxxx#475], Inner, BuildRight, false
> :- TiKV CoprocessorRDD\{[table: xxx] TableReader, Columns: xxxx(): { 
> TableRangeScan: { RangeFilter: [], Range: 
> [([t\200\000\000\000\000\000\004\253_r\200\000\000\000\000\000\000\000], 
> [t\200\000\000\000\000\000\004\253_s\000\000\000\000\000\000\000\000])([t\200\000\000\000\000\000\004\253_r\000\000\000\000\000\000\000\000],
>  [t\200\000\000\000\000\000\004\253_r\200\000\000\000\000\000\000\000])] } }, 
> startTs: 440854942292115639} EstimatedCount:20837
> +- BroadcastExchange HashedRelationBroadcastMode(List(input[107, string, 
> false]),false), [plan_id=32]
> +- Filter isnotnull(xxx#475)
> +- Scan hive xx.xxxxxxxx [, ... 100 more fields], HiveTableRelation 
> [{{{}xx{}}}.{{{}xxx{}}}, 
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, Data Cols: [..., 
> Partition Cols: [[#520|https://github.com/pingcap/tispark/issues/520], 
> [#521|https://github.com/pingcap/tispark/issues/521], 
> [#522|https://github.com/pingcap/tispark/issues/522], 
> [#523|https://github.com/pingcap/tispark/pull/523]], Pruned Partitions: [(, , 
> , )]], [isnotnull(), (), (xx = xx)]
> Here I got some log info maybe helpful.
> The {{plan.stats.sizeInBytes}} of the LogicalPlan of the hive table is too 
> small and the {{plan.stats.sizeInBytes}} of LogicalPlan of the tidb table is 
> too big.
> The stats of the LogicalPlans of the two seems reversed.
> *Spark and TiSpark version info*
> Spark 3.2.3
> TiSpark 3.1.2(with a profile of spark-3.2)
> *Additional context*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to