[
https://issues.apache.org/jira/browse/HIVE-20056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16549529#comment-16549529
]
Sahil Takiar commented on HIVE-20056:
-------------------------------------
[~lirui] could you take a look? It seems that we call {{SparkPartitionPruner}}
whenever we call {{init}} in {{HiveInputFormat}}, but {{init}} is called in
both {{getSplits}} and {{getRecordReader}}, which means we call
{{SparkPartitionPruner}} for every file that we open inside a HoS task. Calling
the pruner means reading the associated file on HDFS. This change ensures that
the pruning is just done once.
> SparkPartitionPruner shouldn't be triggered by Spark tasks
> ----------------------------------------------------------
>
> Key: HIVE-20056
> URL: https://issues.apache.org/jira/browse/HIVE-20056
> Project: Hive
> Issue Type: Sub-task
> Components: Spark
> Reporter: Sahil Takiar
> Assignee: Sahil Takiar
> Priority: Major
> Attachments: HIVE-20056.1.patch
>
>
> It looks like {{SparkDynamicPartitionPruner}} is being called by every Spark
> task because it gets created whenever {{getRecordReader}} is called on the
> associated {{InputFormat}}.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)