[ 
https://issues.apache.org/jira/browse/HIVE-12727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110149#comment-15110149
 ] 

Thejas M Nair commented on HIVE-12727:
--------------------------------------

bq. we want Hive to be geared for production use cases and not for 
pocs/benchmarks/ease-of-use
I think we really need to give lot of importance to ease of use. In this case, 
path of least astonishment subcategory of ease of use :)

The strict mode is trying to enforce two categories of checks - 
 1. Prevent use of questionable semantics
 2. Prevent use of queries that can potentially take too much of cluster 
resources, which have some symptoms of poorly written queries. 

I think current config is doing a poor job of doing the 2nd check. It is 
relying on heuristics that rely on the operations in the query. Ideally, these 
checks should consider the actual cost of the query. For example, if the result 
of a query is small, or if hive.optimize.sampling.orderby=true, it should be 
perfectly OK to have a order-by without limit. In case of tables that have a 
small number of partitions, it should be OK to have no partition clause in the 
query. 
I think the 2nd category of checks are not mature enough to be enabled by 
default, while the first category is.

Also, note that the 2nd category of checks are also likely to break general BI 
tools as they won't be aware of these idiosyncrasies of hive.

I propose that we split the checks clearly into above two categories and enable 
only the first kind by default.

How to separate the checks into two ? I think the category of semistrict and 
strict is confusing.  It would be clearer to give names for the categories of 
checks.
Maybe support comma separated list of categories of checks to be enforced ?

How about calling the parameter hive.strict.checks and supporting list values 
of "semantic" and "largequerypattern" (or similar) ? 

The equivalent of current strict mode becomes - 
hive.strict.checks=semantic,largequerypattern

The new default used becomes -
hive.strict.checks=semantic

Thoughts ?


> allow full table queries in strict mode
> ---------------------------------------
>
>                 Key: HIVE-12727
>                 URL: https://issues.apache.org/jira/browse/HIVE-12727
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>            Priority: Blocker
>         Attachments: HIVE-12727.01.patch, HIVE-12727.patch
>
>
> Making strict mode the default recently appears to have broken many normal 
> queries, such as some TPCDS benchmark queries, e.g. Q85:
> Response message: org.apache.hive.service.cli.HiveSQLException: Error while 
> compiling statement: FAILED: SemanticException [Error 10041]: No partition 
> predicate found for Alias "web_sales" Table "web_returns"
> We should remove this restriction from strict mode, or change the default 
> back to non-strict. Perhaps make a 3-value parameter, nonstrict, semistrict, 
> and strict, for backward compat for people who are relying on strict already.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to