[jira] [Commented] (PHOENIX-6694) Share the query plan generated in data source reader in partition readers to avoid the unnecessary touch basing system tables and meta table in all the workers.

Rajeshbabu Chintaguntla (Jira) Tue, 26 Apr 2022 01:21:05 -0700


    [ 
https://issues.apache.org/jira/browse/PHOENIX-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527998#comment-17527998
 ]


Rajeshbabu Chintaguntla commented on PHOENIX-6694:
--------------------------------------------------

There are test cases failures because the queryplan is not serializable object. 
May be we need to pass individual fields required in the workers.
{noformat}
org.apache.spark.SparkException: 
Job aborted due to stage failure: Failed to serialize task 0, not attempting to 
retry it. Exception during serialization: java.io.NotSerializableException: 
org.apache.phoenix.execute.ScanPlan
Serialization stack:
        - object not serializable (class: org.apache.phoenix.execute.ScanPlan, 
value: org.apache.phoenix.execute.ScanPlan@29f84a7d)
        - field (class: 
org.apache.phoenix.spark.datasource.v2.reader.PhoenixDataSourceReadOptions, 
name: queryPlan, type: interface org.apache.phoenix.compile.QueryPlan)
        - object (class 
org.apache.phoenix.spark.datasource.v2.reader.PhoenixDataSourceReadOptions, 
org.apache.phoenix.spark.datasource.v2.reader.PhoenixDataSourceReadOptions@2ef524be)
        - field (class: 
org.apache.phoenix.spark.datasource.v2.reader.PhoenixInputPartition, name: 
options, type: class 
org.apache.phoenix.spark.datasource.v2.reader.PhoenixDataSourceReadOptions)
        - object (class 
org.apache.phoenix.spark.datasource.v2.reader.PhoenixInputPartition, 
org.apache.phoenix.spark.datasource.v2.reader.PhoenixInputPartition@52b7c73e)
        - field (class: 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDDPartition, name: 
inputPartition, type: interface 
org.apache.spark.sql.sources.v2.reader.InputPartition)
        - object (class 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDDPartition, 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDDPartition@0)

{noformat}

> Share the query plan generated in data source reader in partition readers to 
> avoid the unnecessary touch basing system tables and meta table in all the 
> workers.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-6694
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6694
>             Project: Phoenix
>          Issue Type: Bug
>          Components: spark-connector
>            Reporter: Rajeshbabu Chintaguntla
>            Assignee: Rajeshbabu Chintaguntla
>            Priority: Major
>
> Currently we are preparing the query plan in both data source and partition 
> readers which is creating new connection in each worker and job 
> initialisation  which unnecessarily  touch basing all both system catalog 
> table, system stats table as well as meta. When there are jobs with millions 
> of parallel workers hotspot the region servers holding the meta and system 
> catalog as well system stats table. So if we share the same query plan 
> between the workers which can avoid the hotspot.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Commented] (PHOENIX-6694) Share the query plan generated in data source reader in partition readers to avoid the unnecessary touch basing system tables and meta table in all the workers.

Reply via email to