[
https://issues.apache.org/jira/browse/PHOENIX-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537468#comment-17537468
]
ASF GitHub Bot commented on PHOENIX-6694:
-----------------------------------------
stoty commented on code in PR #80:
URL: https://github.com/apache/phoenix-connectors/pull/80#discussion_r873581804
##########
phoenix-spark-base/src/main/java/org/apache/phoenix/spark/datasource/v2/reader/PhoenixDataSourceReader.java:
##########
@@ -181,10 +184,12 @@ public List<InputPartition<InternalRow>>
planInputPartitions() {
// Get the region size
long regionSize = CompatUtil.getSize(regionLocator,
connection.getAdmin(), location);
-
+ byte[] tableBytes =
PTableImpl.toProto(queryPlan.getTableRef().getTable()).
Review Comment:
Can we have a bit more descripte name ?
Like pTableCacheBytes ?
> Avoid unnecessary calls of fetching table meta data to region servers holding
> the system tables in batch oriented jobs in spark or hive otherwise those RS
> become hotspot
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: PHOENIX-6694
> URL: https://issues.apache.org/jira/browse/PHOENIX-6694
> Project: Phoenix
> Issue Type: Task
> Components: hive-connector, spark-connector
> Reporter: Rajeshbabu Chintaguntla
> Assignee: Rajeshbabu Chintaguntla
> Priority: Major
>
> Currently we are preparing the query plan in both data source and partition
> readers which is creating new connection in each worker and job
> initialisation which unnecessarily touch basing all both system catalog
> table, system stats table as well as meta. When there are jobs with millions
> of parallel workers hotspot the region servers holding the meta and system
> catalog as well system stats table. So if we share the same query plan
> between the workers which can avoid the hotspot.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)