[
https://issues.apache.org/jira/browse/HIVE-15398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergey Shelukhin updated HIVE-15398:
------------------------------------
Description:
See HIVE-15397.
There are multiple complementary ways to handle this properly:
1) Enhance MetadataOnly to recognize when table emptiness matters and only
optimize safe query patterns (or only use the below in unsafe cases).
2) Create the original IF inside compilation, get record reader and see if it's
empty. Seems like the only bulletproof method in terms of correctness, but it
may break due to difference in setup and access between tasks and compilation.
May also have security implications e.g. if compilation is in HS2 and
permissions are different from tasks.
3) Somehow inject limit into table scan (using limit in the plan, or just hack
it into TS itself specifically for this feature), and keep the original
InputFormat. That way instead of 0 or 1 null rows it would return 0 or 1 rows
from the original split, while avoiding large scans, which is the goal.
> change metadata-only queries to still read the original table (in some cases?)
> ------------------------------------------------------------------------------
>
> Key: HIVE-15398
> URL: https://issues.apache.org/jira/browse/HIVE-15398
> Project: Hive
> Issue Type: Bug
> Reporter: Sergey Shelukhin
>
> See HIVE-15397.
> There are multiple complementary ways to handle this properly:
> 1) Enhance MetadataOnly to recognize when table emptiness matters and only
> optimize safe query patterns (or only use the below in unsafe cases).
> 2) Create the original IF inside compilation, get record reader and see if
> it's empty. Seems like the only bulletproof method in terms of correctness,
> but it may break due to difference in setup and access between tasks and
> compilation. May also have security implications e.g. if compilation is in
> HS2 and permissions are different from tasks.
> 3) Somehow inject limit into table scan (using limit in the plan, or just
> hack it into TS itself specifically for this feature), and keep the original
> InputFormat. That way instead of 0 or 1 null rows it would return 0 or 1 rows
> from the original split, while avoiding large scans, which is the goal.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)