[
https://issues.apache.org/jira/browse/PHOENIX-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527998#comment-17527998
]
Rajeshbabu Chintaguntla commented on PHOENIX-6694:
--------------------------------------------------
There are test cases failures because the queryplan is not serializable object.
May be we need to pass individual fields required in the workers.
{noformat}
org.apache.spark.SparkException:
Job aborted due to stage failure: Failed to serialize task 0, not attempting to
retry it. Exception during serialization: java.io.NotSerializableException:
org.apache.phoenix.execute.ScanPlan
Serialization stack:
- object not serializable (class: org.apache.phoenix.execute.ScanPlan,
value: org.apache.phoenix.execute.ScanPlan@29f84a7d)
- field (class:
org.apache.phoenix.spark.datasource.v2.reader.PhoenixDataSourceReadOptions,
name: queryPlan, type: interface org.apache.phoenix.compile.QueryPlan)
- object (class
org.apache.phoenix.spark.datasource.v2.reader.PhoenixDataSourceReadOptions,
org.apache.phoenix.spark.datasource.v2.reader.PhoenixDataSourceReadOptions@2ef524be)
- field (class:
org.apache.phoenix.spark.datasource.v2.reader.PhoenixInputPartition, name:
options, type: class
org.apache.phoenix.spark.datasource.v2.reader.PhoenixDataSourceReadOptions)
- object (class
org.apache.phoenix.spark.datasource.v2.reader.PhoenixInputPartition,
org.apache.phoenix.spark.datasource.v2.reader.PhoenixInputPartition@52b7c73e)
- field (class:
org.apache.spark.sql.execution.datasources.v2.DataSourceRDDPartition, name:
inputPartition, type: interface
org.apache.spark.sql.sources.v2.reader.InputPartition)
- object (class
org.apache.spark.sql.execution.datasources.v2.DataSourceRDDPartition,
org.apache.spark.sql.execution.datasources.v2.DataSourceRDDPartition@0)
{noformat}
> Share the query plan generated in data source reader in partition readers to
> avoid the unnecessary touch basing system tables and meta table in all the
> workers.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: PHOENIX-6694
> URL: https://issues.apache.org/jira/browse/PHOENIX-6694
> Project: Phoenix
> Issue Type: Bug
> Components: spark-connector
> Reporter: Rajeshbabu Chintaguntla
> Assignee: Rajeshbabu Chintaguntla
> Priority: Major
>
> Currently we are preparing the query plan in both data source and partition
> readers which is creating new connection in each worker and job
> initialisation which unnecessarily touch basing all both system catalog
> table, system stats table as well as meta. When there are jobs with millions
> of parallel workers hotspot the region servers holding the meta and system
> catalog as well system stats table. So if we share the same query plan
> between the workers which can avoid the hotspot.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)