[ 
https://issues.apache.org/jira/browse/HIVE-16024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890753#comment-15890753
 ] 

Vihang Karajgaonkar commented on HIVE-16024:
--------------------------------------------

Thanks [~zsombor.klara] for the patch. I can understand that tuning the 
performance (memory/runtime) of {{msck repair table}} was not the original 
intent of this patch and we should probably take this up in a separate JIRA. I 
wonder if this patch is regressing the performance of {{MSCK REPAIR TABLE}} 
queries esp. on S3. The original implementation was checking for the strict 
mode config and not allowing {{MSCK ..}} to scan all the partitions if strict 
mode is set. But this implementation now switches to using 
{{PartitionIterable}} for all the msck queries irrespective of whether it would 
have caused a OOM or not. According to documentation of {{PartitionIterable}} 

bq. It is very likely that any calls to PartitionIterable are going to result 
in a large number of calls, so use sparingly only when the memory cost of 
fetching all the partitions in one shot is too prohibitive.

If we want to enable msck repair even for strict mode then it might be a good 
idea to invoke PartitionIterable only when user is setting the strict mode like 
original use-case. That would be a good middle ground I believe without causing 
performance regressions and causing OOM errors in pursuit of higher 
performance. What do you think?

> MSCK Repair Requires nonstrict hive.mapred.mode
> -----------------------------------------------
>
>                 Key: HIVE-16024
>                 URL: https://issues.apache.org/jira/browse/HIVE-16024
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 2.2.0
>            Reporter: Barna Zsombor Klara
>            Assignee: Barna Zsombor Klara
>         Attachments: HIVE-16024.01.patch, HIVE-16024.02.patch, 
> HIVE-16024.03.patch
>
>
> MSCK repair fails when hive.mapred.mode is set to strict
> HIVE-13788 modified the way we read up partitions for a table to improve 
> performance. Unfortunately it is using PartitionPruner to load the partitions 
> which in turn is checking hive.mapred.mode.
> The previous code did not check hive.mapred.mode.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to