[
https://issues.apache.org/jira/browse/HIVE-12727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110149#comment-15110149
]
Thejas M Nair commented on HIVE-12727:
--------------------------------------
bq. we want Hive to be geared for production use cases and not for
pocs/benchmarks/ease-of-use
I think we really need to give lot of importance to ease of use. In this case,
path of least astonishment subcategory of ease of use :)
The strict mode is trying to enforce two categories of checks -
1. Prevent use of questionable semantics
2. Prevent use of queries that can potentially take too much of cluster
resources, which have some symptoms of poorly written queries.
I think current config is doing a poor job of doing the 2nd check. It is
relying on heuristics that rely on the operations in the query. Ideally, these
checks should consider the actual cost of the query. For example, if the result
of a query is small, or if hive.optimize.sampling.orderby=true, it should be
perfectly OK to have a order-by without limit. In case of tables that have a
small number of partitions, it should be OK to have no partition clause in the
query.
I think the 2nd category of checks are not mature enough to be enabled by
default, while the first category is.
Also, note that the 2nd category of checks are also likely to break general BI
tools as they won't be aware of these idiosyncrasies of hive.
I propose that we split the checks clearly into above two categories and enable
only the first kind by default.
How to separate the checks into two ? I think the category of semistrict and
strict is confusing. It would be clearer to give names for the categories of
checks.
Maybe support comma separated list of categories of checks to be enforced ?
How about calling the parameter hive.strict.checks and supporting list values
of "semantic" and "largequerypattern" (or similar) ?
The equivalent of current strict mode becomes -
hive.strict.checks=semantic,largequerypattern
The new default used becomes -
hive.strict.checks=semantic
Thoughts ?
> allow full table queries in strict mode
> ---------------------------------------
>
> Key: HIVE-12727
> URL: https://issues.apache.org/jira/browse/HIVE-12727
> Project: Hive
> Issue Type: Bug
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Priority: Blocker
> Attachments: HIVE-12727.01.patch, HIVE-12727.patch
>
>
> Making strict mode the default recently appears to have broken many normal
> queries, such as some TPCDS benchmark queries, e.g. Q85:
> Response message: org.apache.hive.service.cli.HiveSQLException: Error while
> compiling statement: FAILED: SemanticException [Error 10041]: No partition
> predicate found for Alias "web_sales" Table "web_returns"
> We should remove this restriction from strict mode, or change the default
> back to non-strict. Perhaps make a 3-value parameter, nonstrict, semistrict,
> and strict, for backward compat for people who are relying on strict already.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)