[
https://issues.apache.org/jira/browse/HIVE-16024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890753#comment-15890753
]
Vihang Karajgaonkar commented on HIVE-16024:
--------------------------------------------
Thanks [~zsombor.klara] for the patch. I can understand that tuning the
performance (memory/runtime) of {{msck repair table}} was not the original
intent of this patch and we should probably take this up in a separate JIRA. I
wonder if this patch is regressing the performance of {{MSCK REPAIR TABLE}}
queries esp. on S3. The original implementation was checking for the strict
mode config and not allowing {{MSCK ..}} to scan all the partitions if strict
mode is set. But this implementation now switches to using
{{PartitionIterable}} for all the msck queries irrespective of whether it would
have caused a OOM or not. According to documentation of {{PartitionIterable}}
bq. It is very likely that any calls to PartitionIterable are going to result
in a large number of calls, so use sparingly only when the memory cost of
fetching all the partitions in one shot is too prohibitive.
If we want to enable msck repair even for strict mode then it might be a good
idea to invoke PartitionIterable only when user is setting the strict mode like
original use-case. That would be a good middle ground I believe without causing
performance regressions and causing OOM errors in pursuit of higher
performance. What do you think?
> MSCK Repair Requires nonstrict hive.mapred.mode
> -----------------------------------------------
>
> Key: HIVE-16024
> URL: https://issues.apache.org/jira/browse/HIVE-16024
> Project: Hive
> Issue Type: Bug
> Components: Metastore
> Affects Versions: 2.2.0
> Reporter: Barna Zsombor Klara
> Assignee: Barna Zsombor Klara
> Attachments: HIVE-16024.01.patch, HIVE-16024.02.patch,
> HIVE-16024.03.patch
>
>
> MSCK repair fails when hive.mapred.mode is set to strict
> HIVE-13788 modified the way we read up partitions for a table to improve
> performance. Unfortunately it is using PartitionPruner to load the partitions
> which in turn is checking hive.mapred.mode.
> The previous code did not check hive.mapred.mode.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)