[
https://issues.apache.org/jira/browse/SPARK-7713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549121#comment-14549121
]
Yin Huai commented on SPARK-7713:
---------------------------------
Yes. It is the same thing. I will have a fix just for SQL because we just
migrated our parquet to our new API for partitioned data sources and I am worry
about the performance regression for users having parquet tables with lots of
partitions. Then, we can have a general purpose fix under SPARK-7410.
> Use shared broadcast hadoop conf for partitioned table scan.
> ------------------------------------------------------------
>
> Key: SPARK-7713
> URL: https://issues.apache.org/jira/browse/SPARK-7713
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.4.0
> Reporter: Yin Huai
> Assignee: Yin Huai
> Priority: Blocker
>
> While debugging SPARK-7673, we also found that we are broadcasting a hadoop
> conf for every Partition (backed by a Hadoop RDD). It also causes the
> performance regression of compiling a query involving a large number of
> partitions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]