In your opinion, in multi-clause DDL statements like

alter table p partition (j<2 or j>0, k like "%", z = '') set uncached;

Should "z = ''" be a synonym for "z IS NULL" like it is in the
single-clause DDL?

On Wed, Jul 6, 2016 at 11:28 AM, Marcel Kornacker <[email protected]> wrote:
> On Wed, Jul 6, 2016 at 10:20 AM, Jim Apple <[email protected]> wrote:
>> Let me try to explain what is going on here.
>>
>> Currently, if a user wants to specify a null partition for a DDL
>> operation, they write something like
>>
>> compute incremental stats incremental_null_part_key partition(p = NULL);
>
> We need to keep this working for the time being.
>
>>
>> For an empty string, they could write:
>>
>> alter table t_part drop partition (j=2, s='')
>>
>> This is unfortunate, as nothing "equals" NULL, and empty strings are
>> mapped to the NULL partition value.
>>
>> Amos has written a patch that allows DDL operations to work on more
>> than one partition at a time. These look like:
>>
>> alter table p partition (j<2 or j>0, k like "%") set uncached;
>>
>> Here, the clauses separated by commas are ANDed together to make one
>> clause. The question is whether these clauses, which now are clauses
>> and not just strangley-interpreted-equality, should keep the old
>> behavior or break existing queries.
>
> For these clauses we should use 'IS [NOT] NULL'.
>
>>
>> On Wed, Jul 6, 2016 at 6:44 AM, Amos Bird <[email protected]> wrote:
>>> This problem came from https://issues.cloudera.org/browse/IMPALA-1654 , CR 
>>> at https://gerrit.cloudera.org/#/c/1563/ . This patch will make general 
>>> predicates possible in most partition DDL operations. However, for NULL 
>>> partitions, the old KV way no longer works. Broken cases are <string 
>>> val>="" and <val>=null. This is due to the usage of HdfsPartitionPruner 
>>> which is used for Query time partition pruning. Should we keep the old way 
>>> of treating NULL partition as special cases?
>>>
>>> Amos

Reply via email to