[ 
https://issues.apache.org/jira/browse/PHOENIX-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527998#comment-17527998
 ] 

Rajeshbabu Chintaguntla commented on PHOENIX-6694:
--------------------------------------------------

There are test cases failures because the queryplan is not serializable object. 
May be we need to pass individual fields required in the workers.
{noformat}
org.apache.spark.SparkException: 
Job aborted due to stage failure: Failed to serialize task 0, not attempting to 
retry it. Exception during serialization: java.io.NotSerializableException: 
org.apache.phoenix.execute.ScanPlan
Serialization stack:
        - object not serializable (class: org.apache.phoenix.execute.ScanPlan, 
value: org.apache.phoenix.execute.ScanPlan@29f84a7d)
        - field (class: 
org.apache.phoenix.spark.datasource.v2.reader.PhoenixDataSourceReadOptions, 
name: queryPlan, type: interface org.apache.phoenix.compile.QueryPlan)
        - object (class 
org.apache.phoenix.spark.datasource.v2.reader.PhoenixDataSourceReadOptions, 
org.apache.phoenix.spark.datasource.v2.reader.PhoenixDataSourceReadOptions@2ef524be)
        - field (class: 
org.apache.phoenix.spark.datasource.v2.reader.PhoenixInputPartition, name: 
options, type: class 
org.apache.phoenix.spark.datasource.v2.reader.PhoenixDataSourceReadOptions)
        - object (class 
org.apache.phoenix.spark.datasource.v2.reader.PhoenixInputPartition, 
org.apache.phoenix.spark.datasource.v2.reader.PhoenixInputPartition@52b7c73e)
        - field (class: 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDDPartition, name: 
inputPartition, type: interface 
org.apache.spark.sql.sources.v2.reader.InputPartition)
        - object (class 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDDPartition, 
org.apache.spark.sql.execution.datasources.v2.DataSourceRDDPartition@0)

{noformat}

> Share the query plan generated in data source reader in partition readers to 
> avoid the unnecessary touch basing system tables and meta table in all the 
> workers.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-6694
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6694
>             Project: Phoenix
>          Issue Type: Bug
>          Components: spark-connector
>            Reporter: Rajeshbabu Chintaguntla
>            Assignee: Rajeshbabu Chintaguntla
>            Priority: Major
>
> Currently we are preparing the query plan in both data source and partition 
> readers which is creating new connection in each worker and job 
> initialisation  which unnecessarily  touch basing all both system catalog 
> table, system stats table as well as meta. When there are jobs with millions 
> of parallel workers hotspot the region servers holding the meta and system 
> catalog as well system stats table. So if we share the same query plan 
> between the workers which can avoid the hotspot.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to