[ 
https://issues.apache.org/jira/browse/PHOENIX-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17537926#comment-17537926
 ] 

ASF GitHub Bot commented on PHOENIX-6694:
-----------------------------------------

stoty commented on code in PR #80:
URL: https://github.com/apache/phoenix-connectors/pull/80#discussion_r874344921


##########
phoenix-spark-base/src/main/java/org/apache/phoenix/spark/datasource/v2/reader/PhoenixInputPartitionReader.java:
##########
@@ -94,6 +99,10 @@ private QueryPlan getQueryPlan() throws SQLException {
         }
         try (Connection conn = DriverManager.getConnection(
                 JDBC_PROTOCOL + JDBC_PROTOCOL_SEPARATOR + zkUrl, 
overridingProps)) {
+            PTable pTable = PTable.parseFrom(options.getTableBytes());
+            org.apache.phoenix.schema.PTable table = 
PTableImpl.createFromProto(pTable);
+            PhoenixConnection phoenixConnection = 
conn.unwrap(PhoenixConnection.class);
+            phoenixConnection.addTable(table, System.currentTimeMillis());

Review Comment:
   Interesting point about the timestamp.
   
   The point of this patch is to avoid hammering the system tables with a huge 
number of parallel requests.
   I think that if we have executor starvation, then the jobs will not be 
started immediately, and the syscat load is not really a problem.
   
   Can you think of a case of when the jobs are delayed enough for this to 
matter, but enough of them start up synchrounously for that to be a problem (I 
don't know enough about Spark to tell) ?





> Avoid unnecessary calls of fetching table meta data to region servers holding 
> the system tables in batch oriented jobs in spark or hive otherwise those RS 
> become hotspot
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-6694
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6694
>             Project: Phoenix
>          Issue Type: Task
>          Components: hive-connector, spark-connector
>            Reporter: Rajeshbabu Chintaguntla
>            Assignee: Rajeshbabu Chintaguntla
>            Priority: Major
>
> Currently we are preparing the query plan in both data source and partition 
> readers which is creating new connection in each worker and job 
> initialisation  which unnecessarily  touch basing all both system catalog 
> table, system stats table as well as meta. When there are jobs with millions 
> of parallel workers hotspot the region servers holding the meta and system 
> catalog as well system stats table. So if we share the same query plan 
> between the workers which can avoid the hotspot.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to