[jira] [Commented] (HIVE-29312) Concatenate equality conditions in AND nodes

Wechar (Jira) Fri, 27 Feb 2026 08:40:07 -0800


    [ 
https://issues.apache.org/jira/browse/HIVE-29312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18061673#comment-18061673
 ]


Wechar commented on HIVE-29312:
-------------------------------

*Background*
In our prod env, some user has the traffic dwd tables that have region, date, 
hour, event_type etc partition columns, and would produce millions partitions, 
the table users may scan such tables with some specific region, date and some 
other partition filters.

Our backend MySQL will suffer high load when such query is frequent, so we want 
to improve it.

 
{quote}Consider for example a table partitioned by (a,b,c,d) and partitioning 
filter that says (a=1 and d=1); the generated SQL query will be PART_NAME LIKE 
'a=1/%' AND PART_NAME LIKE '%/d=1' and the only way to collapse this is 
PART_NAME LIKE 'a=1%d=1' but the latter is not gonna make any difference for 
the index range scan.
{quote}
I have add a new bench mark test *multiPartialAnd* which is {*}`p_a`='a0' and 
`p_c`='c0'{*}, and the result still shows some improvements for huge partitions:
{code:bash}
- Before
Operation                      Mean     Med      Min      Max      Err%
getPartitionsByFilter#multiAnd.100 4.174    4.117    3.944    5.895    5.777
getPartitionsByFilter#multiAnd.10000 6.947    6.872    6.542    8.931    4.851
getPartitionsByFilter#multiAnd.1000000 272.1    268.9    267.0    362.8    4.296
getPartitionsByFilter#multiOr.100 5.405    4.607    4.083    16.91    48.07
getPartitionsByFilter#multiOr.10000 8.038    7.871    7.595    18.75    13.77
getPartitionsByFilter#multiOr.1000000 368.5    365.0    362.6    563.6    5.565
getPartitionsByFilter#multiPartialAnd.100 4.271    4.087    3.881    11.39    
22.53
getPartitionsByFilter#multiPartialAnd.10000 6.805    6.682    6.448    11.59    
9.627
getPartitionsByFilter#multiPartialAnd.1000000 268.7    267.7    266.2    281.7  
  1.127
getPartitionsByFilter#simple.100 4.705    4.439    3.983    15.68    28.76
getPartitionsByFilter#simple.10000 6.440    6.341    6.026    9.197    5.792
getPartitionsByFilter#simple.1000000 244.9    237.8    234.8    290.2    5.477

- After
 Operation                      Mean     Med      Min      Max      Err%
getPartitionsByFilter#multiAnd.100 4.087    4.048    3.852    4.817    4.399
getPartitionsByFilter#multiAnd.10000 6.659    6.566    6.057    8.500    7.293
getPartitionsByFilter#multiAnd.1000000 233.6    231.3    230.1    331.3    4.601
getPartitionsByFilter#multiOr.100 4.378    4.340    4.058    6.571    6.313
getPartitionsByFilter#multiOr.10000 7.793    7.738    7.541    8.429    2.133
getPartitionsByFilter#multiOr.1000000 372.3    368.3    365.9    452.6    3.291
getPartitionsByFilter#multiPartialAnd.100 4.082    4.053    3.761    4.626    
4.254
getPartitionsByFilter#multiPartialAnd.10000 6.604    6.460    6.213    11.09    
9.160
getPartitionsByFilter#multiPartialAnd.1000000 247.8    246.7    245.5    269.3  
  1.361
getPartitionsByFilter#simple.100 4.366    4.276    3.962    6.683    8.101
getPartitionsByFilter#simple.10000 6.355    6.292    6.106    8.225    4.007
getPartitionsByFilter#simple.1000000 239.9    233.6    232.0    342.6    5.944
{code}
I think the improvement is because the regex match times become less even if 
the hitting index length is the same.

> Concatenate equality conditions in AND nodes
> --------------------------------------------
>
>                 Key: HIVE-29312
>                 URL: https://issues.apache.org/jira/browse/HIVE-29312
>             Project: Hive
>          Issue Type: Improvement
>          Components: Standalone Metastore
>    Affects Versions: 4.1.0
>            Reporter: Wechar
>            Assignee: Wechar
>            Priority: Major
>              Labels: pull-request-available
>
> Current partition filtert tree is a binary tree with single condition in the 
> leaf node, actually the adjacent AND leaf nodes can be combined together to 
> match longer index prefix. For example, a table has partition key: a, b, c, 
> d, and a filter is {{a=1 and b=1 and c=1}}, when visit as a binary tree 
> *AND(AND(a=1, b=1), c=1)*,  the filter is:
> {code:sql}
> PART_NAME like 'a=1/%' and PART_NAME like '%/b=2/%' and PART_NAME like 
> '%/c=3/%'
> {code}
> If we combine the adjacent AND conditions, the tree will be *MultiAnd(a=1, 
> b=2, c=3)*, aand the filte can be:
> {code:bash}
> PART_NAME like 'a=1/b=2/c=3/%'
> {code}
> Obviously, the latter could match a longer index prefix and is more efficient.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-29312) Concatenate equality conditions in AND nodes

Reply via email to